US 12,223,952 B2
Generation and utilization of pseudo-correction(s) to prevent forgetting of personalized on-device automatic speech recognition (ASR) model(s)
Rajiv Mathews, Sunnyvale, CA (US); Dragan Zivkovic, Sunnyvale, CA (US); and Khe Chai Sim, Dublin, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Oct. 4, 2022, as Appl. No. 17/959,637.
Prior Publication US 2024/0112672 A1, Apr. 4, 2024
Int. Cl. G10L 15/00 (2013.01); G10L 15/06 (2013.01); G10L 15/19 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)
CPC G10L 15/19 (2013.01) [G10L 15/063 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 2015/0635 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method implemented by one or more processors of a client device, the method comprising:
receiving, via one or more microphones of the client device, audio data that captures a spoken utterance of a user of the client device;
determining, based on on-device automatic speech recognition (ASR) processing of the audio data using one or more ASR models stored locally in on-device storage of the client device, whether to generate a pseudo-correction that is to be subsequently utilized in updating one or more of the ASR models and that is to be generated based on a prior actual correction made by the user of the client device directed to prior on-device ASR processing; and
in response to determining to generate the pseudo-correction:
storing, in the on-device storage of the client device and in association with a pseudo-correction time to live (TTL) in the on-device storage of the client device for the pseudo-correction that lapses subsequent to a correction TTL in the on-device storage of the client device for the prior actual correction, at least a portion of a given speech hypothesis and an alternate speech hypothesis generated based on the on-device ASR processing as the pseudo-correction for one or more of the on-device ASR models; and
storing, in the on-device storage of the client device and in association with the pseudo-correction, the audio data that captures the spoken utterance; and
causing one or more of the on-device ASR models to be updated based on at least the pseudo-correction.