CPC G10L 15/063 (2013.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01); G10L 13/00 (2013.01); G10L 15/16 (2013.01); G10L 15/22 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01)] | 18 Claims |
1. A computer-implemented method when executed on data processing hardware of a user device causes the data processing hardware to perform operations comprising:
capturing a first set of training audio samples spoken by a user of the user device, each training audio sample containing a custom hotword, the custom hotword comprising one or more words;
obtaining a pre-trained model, the pre-trained model trained by a remote system in communication with the user device;
training, using the pre-trained model, a custom hotword model on the first set of training audio samples to learn how to detect a presence of the custom hotword in audio data;
receiving streaming audio data captured by a user device;
determining, using the trained custom hotword model, whether the custom hotword is present in the streaming audio data; and
when the custom hotword is present in the streaming audio data, initiating a wake-up process on the user device for processing the custom hotword and/or one or more other terms following the custom hotword in the streaming audio data.
|