US 12,424,209 B1
	Natural language processing
Chenlei Guo, Redmond, WA (US); Xing Fan, Redmond, WA (US); Bharath Bhimanaik Kumar, Sammamish, WA (US); Kerry Hammil, Bainbridge Island, WA (US); Dinesh Malla, Sunnyvale, CA (US); Puyang Xu, Issaquah, WA (US); and Sixing Lu, Bellevue, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jul. 31, 2023, as Appl. No. 18/362,632.
Int. Cl. G10L 15/183 (2013.01); G10L 15/30 (2013.01)

CPC G10L 15/183 (2013.01) [G10L 15/30 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving first audio data corresponding to a first spoken input;

performing automatic speech recognition (ASR) processing on the first audio data to generate a first transcript of the first spoken input;

determining context data associated with the first spoken input, the context data representing a previous spoken input of a user associated with the first spoken input;

determining a first prompt including the first transcript and the context data, wherein the first prompt is a first instruction to determine at least one task associated with performing an action responsive to the first spoken input;

processing, using a first language model, the first prompt to generate first output data indicating a first task;

retrieving, from a storage, a first set of component descriptions associated with the first task, wherein the first set of component descriptions represent one or more functions performable by at least a first component, a second component, and a third component;

determining a second prompt including the first transcript, the context data, the first task, and the first set of component descriptions, wherein the second prompt is a second instruction for a second language model to generate instructions usable to cause one or more of the first component, the second component, and the third component to process with respect to the first task;

processing, using the second language model, the second prompt to:

generate a first application programming interface (API) call requesting that the first component process with respect to the first task, and

generate a second API call requesting that the second component process with respect to the first task;

using the first API call, causing the first component to generate second output data indicating a first function performable by the first component with respect to the first task;

using the second API call, causing the second component to generate third output data indicating a second function performable by the second component with respect to the first task;

determining the first function, instead of the second function, is responsive to the first task; and

causing the first component to perform the first function.