US 12,462,805 B2
Natural language generation
Hann Wang, Santa Clara, CA (US); Angeliki Metallinou, Mountain View, CA (US); Melanie C B Gens, Honolulu, HI (US); Arijit Biswas, Dublin, CA (US); and Ying Shi, Bellevue, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 30, 2023, as Appl. No. 18/345,455.
Prior Publication US 2025/0006196 A1, Jan. 2, 2025
Int. Cl. G10L 15/26 (2006.01); G10L 13/08 (2013.01)
CPC G10L 15/26 (2013.01) [G10L 13/08 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving input audio data representing a first spoken input;
performing automatic speech recognition (ASR) processing using the input audio data to generate a transcript of the first spoken input;
based on the transcript, determining a first set of actions potentially corresponding to the first spoken input;
based on the transcript and the first set of actions, determining first exemplar data, the first exemplar data including a first example user input similar to the first spoken input, a first action of the first set of actions to be performed in response to the first example user input, and a system response to be generated in response to the first example user input;
determining the first action is associated with performance of an action using a device;
based on determining the first action is associated with performance of the action using the device:
determining a user profile associated with the first spoken input,
determining a first device associated with the user profile, wherein the first device is capable of performing the first action, and
determining first state data representing a first device state of the first device;
determining a first prompt including the first set of actions, the first exemplar data, the first device state, and the transcript, wherein the first prompt is an input for a language model to determine an output responsive to the first spoken input;
processing, using the language model, the first prompt to generate first output data indicating the first action is to be performed in response to the first spoken input and a first natural language response is to be presented;
causing the first device to perform the first action;
receiving first response data indicating performance of the first action; and
based at least in part on receiving the first response data, causing presentation of the first natural language response.