| CPC G10L 15/16 (2013.01) [G10L 15/22 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01)] | 22 Claims |

|
1. A computer-implemented method comprising:
operating a first device using a first wakeword component configured to detect a first wakeword in received audio, the first wakeword component comprising a first convolutional neural network (CNN) encoder and a first convolutional recurrent neural network (CRNN) decoder corresponding to the first wakeword;
determining a request to configure the first device to detect a second wakeword;
determining updated wakeword component data comprising:
first data representing the first CNN encoder,
second data representing the first CRNN decoder, and
third data representing a second CRNN decoder corresponding to the second wakeword;
sending the updated wakeword component data to the first device;
using the updated wakeword component data to configure an updated wakeword component for operation by the first device;
receiving, by the first device, input audio data representing an utterance;
processing the input audio data using the first CNN encoder to determine first encoded audio data comprising a plurality of feature vectors representing acoustic units of the utterance;
processing the first encoded audio data using the first CRNN decoder to determine a first likelihood the utterance included the first wakeword;
processing the first encoded audio data using the second CRNN decoder to determine a second likelihood the utterance included the second wakeword; and
based at least in part on the first likelihood or the second likelihood, causing speech processing to be performed using data representing the utterance.
|