| CPC G10L 15/16 (2013.01) [G10L 15/04 (2013.01); G10L 15/08 (2013.01); G10L 25/78 (2013.01); G10L 2015/088 (2013.01); G10L 25/30 (2013.01); G10L 2025/783 (2013.01); G10L 25/87 (2013.01); G10L 25/93 (2013.01)] | 30 Claims |

|
11. A method for processing audio signals, comprising:
receiving a first audio frame associated with a first time frame;
generating a first time frame feature vector based on the first audio frame;
determining a distance between the first time frame feature vector and a second time frame feature vector, the second time frame feature vector generated based on a second audio frame associated with a second time frame;
comparing the distance between the first time frame feature vector and the second time frame feature vector to a threshold distance;
determining whether to skip processing of the first audio frame by a keyword application based on the comparison;
generating, by a machine learning model, a third time frame score associated with a third time frame, the third time frame being before the second time frame, and wherein the third frame score indicates a probability that a third time frame feature vector, associated with the third frame score, is a presentation of a first portion of a keyword;
generating, by the machine learning model, a second frame score associated with the second time frame wherein the second frame score indicates that the second time frame feature vector is a representation of a second portion of the keyword;
determining that the third frame score is greater than the second frame score; and
determining to not process the first audio frame based on the determination that the third frame score is greater than the second frame score.
|