CPC G10L 15/22 (2013.01) [G10L 15/1815 (2013.01); G10L 15/26 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01); G10L 2015/228 (2013.01)] | 21 Claims |
1. A computer-implemented method, comprising:
generating first output audio using a loudspeaker associated with a device;
receiving first audio data;
processing the first audio data using a first component of the device to determine that the first audio data represents first speech;
in response to determining that the first speech is represented in the first audio data, performing a first action;
determining, by a natural language processing component, first natural language processing data associated with the first speech;
providing the first audio data and the first natural language processing data as inputs to a machine learning component, the machine learning component being configured to classify input data as corresponding to a device-directed speech event;
determining, using the machine learning component, that the first audio data and the first natural language processing data correspond to a first device-directed speech event; and
based at least in part on the first audio data and the first natural language processing data corresponding to the first device-directed speech event, causing natural language processing to be completed based on the first audio data.
|