CPC G10L 21/00 (2013.01) [G06F 18/217 (2023.01); G10L 13/027 (2013.01); G10L 13/086 (2013.01); G10L 17/24 (2013.01); G10L 25/87 (2013.01)] | 16 Claims |
1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising:
obtaining a plurality of training samples for training a hotword detector model, the plurality of training samples comprising:
positive training samples comprising audio representations of a hotword; and
negative training samples comprising synthesized speech utterances generated as output from a text-to-speech (TTS) system, wherein:
at least one of the synthesized speech utterances of the negative training samples includes first speech units and does not include the hotword;
at least another one of the synthesized speech utterances of the negative training samples includes second speech units and the hotword, the second speech units different than the first speech units and configured to prevent the hotword detector model from detecting the hotword in the at least another one of the synthesized speech utterances; and
the TTS system is configured to generate each of the synthesized speech utterances of the negative training samples from corresponding text input data by converting the corresponding text input data into each of the synthesized speech utterances;
training the hotword detector model on the plurality of training samples to teach the hotword detector model to learn to discern between non-synthesized speech comprising the hotword and synthesized speech comprising the hotword;
receiving audio input data that comprises the non-synthesized speech comprising the hotword and one or more other terms following the hotword;
detecting, using the trained hotword detector model, that the audio input data comprises the non-synthesized speech comprising the hotword; and
based on detecting that the audio input data comprises the non-synthesized speech comprising the hotword, performing speech recognition on the one or more other terms following the hotword.
|