US 12,236,939 B2
Method of generating a trigger word detection model, and an apparatus for the same
Sivakumar Balasubramanian, Mountain View, CA (US); Gowtham Srinivasan, San Jose, CA (US); Srinivasa Rao Ponakala, Sunnyvale, CA (US); Anil Sunder Yadav, San Jose, CA (US); and Aditya Jajodia, Edison, NJ (US)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Oct. 12, 2021, as Appl. No. 17/499,072.
Claims priority of provisional application 63/160,697, filed on Mar. 12, 2021.
Prior Publication US 2022/0293088 A1, Sep. 15, 2022
Int. Cl. G10L 15/05 (2013.01); G06N 3/096 (2023.01); G10L 15/06 (2013.01); G10L 15/08 (2006.01); G10L 15/16 (2006.01); G10L 15/20 (2006.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)
CPC G10L 15/05 (2013.01) [G06N 3/096 (2023.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 15/20 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 2015/088 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method of generating a trained trigger word detection model, the method comprising:
training an auxiliary model, based on an auxiliary task, to concentrate on one or more utterances and to learn context of the one or more utterances using phrase training data, wherein the phrase training data comprises pairs of first audio files and second audio files, each of the first audio files comprising first clean audio of a first comparison utterance, and each of the second audio files comprising at least one of a third clean audio, different from the first clean audio, of the first comparison utterance, and a fourth clean audio of a second comparison utterance, different from the first comparison utterance; and
obtaining a trigger word detection model by retraining one or more final layers of the auxiliary model, which is weighted based on the auxiliary task, based on a trigger word detection task that detects one or more trigger words;
wherein the retraining uses training data specific to the one or more trigger words.