US 12,272,357 B2
System and method for accent-agnostic frame-level wake word detection
Sivakumar Balasubramanian, Sunnyvale, CA (US); Gowtham Srinivasan, San Jose, CA (US); Srinivasa Rao Ponakala, Sunnyvale, CA (US); Vijendra Raj Apsingekar, San Jose, CA (US); and Anil Sunder Yadav, San Jose, CA (US)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Sep. 1, 2022, as Appl. No. 17/929,280.
Claims priority of provisional application 63/341,139, filed on May 12, 2022.
Prior Publication US 2023/0368786 A1, Nov. 16, 2023
Int. Cl. G10L 15/22 (2006.01); G10L 15/06 (2013.01)
CPC G10L 15/22 (2013.01) [G10L 15/063 (2013.01); G10L 2015/0631 (2013.01); G10L 2015/223 (2013.01)] 24 Claims
OG exemplary drawing
 
1. A method comprising:
accessing, using at least one processor of an electronic device, a machine learning model, wherein the machine learning model is a trained student model that is trained using audio samples in a plurality of accent types, and wherein the trained student model is trained to detect a wake word using information distilled from a plurality of trained teacher keyword detector models, each trained teacher keyword detector model associated with a different accent type of the plurality of accent types;
receiving, using the at least one processor, an audio input from an audio input device;
providing, using the at least one processor, the audio input to the trained student model;
calculating, using the trained student model, frame-level probabilities associated with the audio input;
receiving, using the at least one processor, an output from the trained student model including the frame-level probabilities associated with the audio input; and
instructing, using the at least one processor, at least one action based on the frame-level probabilities associated with the audio input.