CPC G10L 15/05 (2013.01) [G06N 3/096 (2023.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 15/20 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 2015/088 (2013.01)] | 20 Claims |
1. A method of generating a trained trigger word detection model, the method comprising:
training an auxiliary model, based on an auxiliary task, to concentrate on one or more utterances and to learn context of the one or more utterances using phrase training data, wherein the phrase training data comprises pairs of first audio files and second audio files, each of the first audio files comprising first clean audio of a first comparison utterance, and each of the second audio files comprising at least one of a third clean audio, different from the first clean audio, of the first comparison utterance, and a fourth clean audio of a second comparison utterance, different from the first comparison utterance; and
obtaining a trigger word detection model by retraining one or more final layers of the auxiliary model, which is weighted based on the auxiliary task, based on a trigger word detection task that detects one or more trigger words;
wherein the retraining uses training data specific to the one or more trigger words.
|