US 12,475,890 B1
	Voice-based interactions with a graphical user interface
Senthil Kumar Dayalan, Chennai (IN); Manikandan Thangarathnam, Chennai (IN); Sai Vinayak, Chennai (IN); and Suraj Gopalakrishnan, Chennai (IN)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 18, 2023, as Appl. No. 18/543,946.
Application 18/543,946 is a continuation of application No. 16/903,698, filed on Jun. 17, 2020, granted, now 11,887,589.
Int. Cl. G10L 15/22 (2006.01); G06F 3/0482 (2013.01); G06F 3/16 (2006.01); G10L 15/06 (2013.01); G10L 15/18 (2013.01); G10L 15/26 (2006.01)

CPC G10L 15/22 (2013.01) [G06F 3/0482 (2013.01); G06F 3/167 (2013.01); G10L 15/063 (2013.01); G10L 15/1815 (2013.01); G10L 15/26 (2013.01); G10L 2015/0638 (2013.01); G10L 2015/223 (2013.01)]

20 Claims

1. A device comprising:

a microphone;

a processor; and

a memory storing instructions that, upon execution by the processor, configure the device to:

present content on a display;

detect, by at least using the microphone, a natural language input requesting an operation associated with at least a portion of the content;

generate data that represents the natural language input;

generate an input to a language model based at least in part on the data and whether the content is in view or out of view;

determine, based at least in part on the input, an output of the language model, wherein the output indicates a command to perform and is based at least in part on the data and whether the content is in view or out of view; and

present, on the display, an outcome of performing the command.