| CPC G10L 15/26 (2013.01) [G10L 13/08 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
receiving input audio data representing a first spoken input;
performing automatic speech recognition (ASR) processing using the input audio data to generate a transcript of the first spoken input;
based on the transcript, determining a first set of actions potentially corresponding to the first spoken input;
based on the transcript and the first set of actions, determining first exemplar data, the first exemplar data including a first example user input similar to the first spoken input, a first action of the first set of actions to be performed in response to the first example user input, and a system response to be generated in response to the first example user input;
determining the first action is associated with performance of an action using a device;
based on determining the first action is associated with performance of the action using the device:
determining a user profile associated with the first spoken input,
determining a first device associated with the user profile, wherein the first device is capable of performing the first action, and
determining first state data representing a first device state of the first device;
determining a first prompt including the first set of actions, the first exemplar data, the first device state, and the transcript, wherein the first prompt is an input for a language model to determine an output responsive to the first spoken input;
processing, using the language model, the first prompt to generate first output data indicating the first action is to be performed in response to the first spoken input and a first natural language response is to be presented;
causing the first device to perform the first action;
receiving first response data indicating performance of the first action; and
based at least in part on receiving the first response data, causing presentation of the first natural language response.
|