US 12,266,351 B2
	Adaptive frame skipping for speech recognition
Uday Reddy Thummaluri, Nalgonda (IN); Sachin Abdagire, Hyderabad (IN); and Prapulla Vuppu, Secunderabad (IN)
Assigned to QUALCOMM Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Aug. 26, 2022, as Appl. No. 17/822,715.
Prior Publication US 2024/0071370 A1, Feb. 29, 2024
Int. Cl. G10L 15/04 (2013.01); G10L 15/08 (2006.01); G10L 15/16 (2006.01); G10L 25/78 (2013.01); G10L 25/30 (2013.01); G10L 25/87 (2013.01); G10L 25/93 (2013.01)

CPC G10L 15/16 (2013.01) [G10L 15/04 (2013.01); G10L 15/08 (2013.01); G10L 25/78 (2013.01); G10L 2015/088 (2013.01); G10L 25/30 (2013.01); G10L 2025/783 (2013.01); G10L 25/87 (2013.01); G10L 25/93 (2013.01)]

30 Claims

11. A method for processing audio signals, comprising:

receiving a first audio frame associated with a first time frame;

generating a first time frame feature vector based on the first audio frame;

determining a distance between the first time frame feature vector and a second time frame feature vector, the second time frame feature vector generated based on a second audio frame associated with a second time frame;

comparing the distance between the first time frame feature vector and the second time frame feature vector to a threshold distance;

determining whether to skip processing of the first audio frame by a keyword application based on the comparison;

generating, by a machine learning model, a third time frame score associated with a third time frame, the third time frame being before the second time frame, and wherein the third frame score indicates a probability that a third time frame feature vector, associated with the third frame score, is a presentation of a first portion of a keyword;

generating, by the machine learning model, a second frame score associated with the second time frame wherein the second frame score indicates that the second time frame feature vector is a representation of a second portion of the keyword;

determining that the third frame score is greater than the second frame score; and

determining to not process the first audio frame based on the determination that the third frame score is greater than the second frame score.