US 12,243,563 B2
	Voice-controlled content creation
Wenqing Jiang, Los Angeles, CA (US); Serhan Uslubas, Los Angeles, CA (US); Zheng Li, Culver City, CA (US); Ming Tu, Los Angeles, CA (US); and Shiva Shanker Pandiri, Los Angeles, CA (US)
Assigned to Lemon Inc., Grand Cayman (KY)
Filed by Lemon Inc., Grand Cayman (KY)
Filed on Jun. 10, 2022, as Appl. No. 17/838,022.
Prior Publication US 2023/0402068 A1, Dec. 14, 2023
Int. Cl. G11B 27/34 (2006.01); G06F 3/16 (2006.01); G10L 15/04 (2013.01); G10L 15/22 (2006.01); G10L 25/57 (2013.01); G11B 27/031 (2006.01); G11B 27/036 (2006.01); H04N 5/76 (2006.01); H04N 23/60 (2023.01); H04N 23/62 (2023.01); H04N 23/63 (2023.01)

CPC G11B 27/34 (2013.01) [G10L 15/04 (2013.01); G10L 15/22 (2013.01); G10L 25/57 (2013.01); G11B 27/036 (2013.01); H04N 5/76 (2013.01); G10L 2015/223 (2013.01)]

18 Claims

1. A method of voice-controlled content creation, comprising:

causing to display an interface of a content application comprising a first interface element;

initiating a preview mode and causing to display a second interface element in response to receiving a selection of the first interface element, wherein the selection of the first interface element indicates an intent to create a content;

initiating a listening mode in response to receiving a selection of the second interface element, wherein the selection of the second interface element indicates a satisfaction of a background or scenery displayed in the preview mode, and wherein initiating the listening mode comprises activating a voice recognition process configured to listen for keywords captured by a voice input;

causing to display a third interface element indicative of the listening mode instead of the second interface element at approximately the same location where the second interface element was displayed;

monitoring voice commands spoken by a creator;

initiating recording the content by a camera in response to recognizing a first voice command spoken by the creator;

during recording the content, causing to control operations associated with the camera in response to recognizing a plurality of voice commands, wherein the operations comprise zooming the camera in or out, focusing the camera, and switching to another camera;

during recording the content, causing to add at least one visual effect or aural effect into the content in response to recognizing a second voice command spoken by the creator;

stopping recording the content in response to recognizing a third voice command spoken by the creator;

creating a timestamp associated with the third voice command; and

automatically deleting a segment from the content based on the timestamp, wherein the segment comprises a recording of the third voice command.