US 12,437,754 B2
	Multiple wakeword detection
Gengshen Fu, Sharon, MA (US); Huitian Lei, Medford, MA (US); Sai Kiran Venkata Subramanya Rupanagudi, Burien, WA (US); Yuriy Mishchenko, Lexington, MA (US); and Cody Jacques, Somerville, MA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 10, 2021, as Appl. No. 17/547,547.
Prior Publication US 2023/0186902 A1, Jun. 15, 2023
Int. Cl. G10L 15/16 (2006.01); G10L 15/22 (2006.01); G10L 15/08 (2006.01)

CPC G10L 15/16 (2013.01) [G10L 15/22 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01)]

22 Claims

1. A computer-implemented method comprising:

operating a first device using a first wakeword component configured to detect a first wakeword in received audio, the first wakeword component comprising a first convolutional neural network (CNN) encoder and a first convolutional recurrent neural network (CRNN) decoder corresponding to the first wakeword;

determining a request to configure the first device to detect a second wakeword;

determining updated wakeword component data comprising:

first data representing the first CNN encoder,

second data representing the first CRNN decoder, and

third data representing a second CRNN decoder corresponding to the second wakeword;

sending the updated wakeword component data to the first device;

using the updated wakeword component data to configure an updated wakeword component for operation by the first device;

receiving, by the first device, input audio data representing an utterance;

processing the input audio data using the first CNN encoder to determine first encoded audio data comprising a plurality of feature vectors representing acoustic units of the utterance;

processing the first encoded audio data using the first CRNN decoder to determine a first likelihood the utterance included the first wakeword;

processing the first encoded audio data using the second CRNN decoder to determine a second likelihood the utterance included the second wakeword; and

based at least in part on the first likelihood or the second likelihood, causing speech processing to be performed using data representing the utterance.