US 12,236,950 B2
Device-directed utterance detection
Ariya Rastrow, Seattle, WA (US); Eli Joshua Fidler, Toronto (CA); Roland Maximilian Rolf Maas, Seattle, WA (US); Nikko Strom, Kirkland, WA (US); Aaron Eakin, Seattle, WA (US); Diamond Bishop, Seattle, WA (US); Bjorn Hoffmeister, Seattle, WA (US); and Sanjeev Mishra, Bellevue, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jan. 3, 2023, as Appl. No. 18/149,181.
Application 18/149,181 is a continuation of application No. 16/822,744, filed on Mar. 18, 2020, granted, now 11,551,685.
Prior Publication US 2023/0223023 A1, Jul. 13, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/22 (2006.01); G10L 15/18 (2013.01); G10L 15/26 (2006.01); G10L 15/08 (2006.01)
CPC G10L 15/22 (2013.01) [G10L 15/1815 (2013.01); G10L 15/26 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01); G10L 2015/228 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
generating first output audio using a loudspeaker associated with a device;
receiving first audio data;
processing the first audio data using a first component of the device to determine that the first audio data represents first speech;
in response to determining that the first speech is represented in the first audio data, performing a first action;
determining, by a natural language processing component, first natural language processing data associated with the first speech;
providing the first audio data and the first natural language processing data as inputs to a machine learning component, the machine learning component being configured to classify input data as corresponding to a device-directed speech event;
determining, using the machine learning component, that the first audio data and the first natural language processing data correspond to a first device-directed speech event; and
based at least in part on the first audio data and the first natural language processing data corresponding to the first device-directed speech event, causing natural language processing to be completed based on the first audio data.