US 12,340,799 B2
Identifying and correcting automatic speech recognition (ASR) misrecognitions in a decentralized manner
Rajiv Mathews, Sunnyvale, CA (US); Rohit Prabhavalkar, Santa Clara, CA (US); Giovanni Motta, San Jose, CA (US); Mingqing Chen, Saratoga, CA (US); Lillian Zhou, Mountain View, CA (US); Dhruv Guliani, San Francisco, CA (US); Harry Zhang, Sunnyvale, CA (US); Trevor Strohman, Sunnyvale, CA (US); and Françoise Beaufays, Mountain View, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Oct. 3, 2022, as Appl. No. 17/958,887.
Prior Publication US 2024/0112673 A1, Apr. 4, 2024
Int. Cl. G10L 15/00 (2013.01); G10L 15/06 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)
CPC G10L 15/197 (2013.01) [G10L 15/063 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 15/00 (2013.01); G10L 2015/0635 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method implemented by one or more processors of a remote system, the method comprising:
receiving, from a plurality of client devices, corresponding candidate correction pairs, each of the corresponding candidate correction pairs including:
a corresponding portion of a corresponding predicted textual segment that was generated based on processing corresponding audio data locally at a corresponding one of the plurality of client devices and using a corresponding on-device automatic speech recognition (ASR) model, and
a corresponding alternate textual segment that was generated locally at the corresponding one of the plurality of client devices and based on a corresponding modification to the corresponding portion of the corresponding predicted textual segment that resulted in the corresponding alternate textual segment;
determining whether a given corresponding candidate correction pair, of the corresponding candidate correction pairs, is a corresponding actual correction pair based a threshold quantity of occurrences of the given corresponding candidate correction pair being received from one or more of the plurality of client devices; and
in response to determining that the given corresponding candidate correction pair is a corresponding actual correction pair:
identifying, from among the plurality of client devices that provided the given corresponding candidate correction pair, a subset of the plurality of client devices that provided the given corresponding candidate correction pair; and
causing a global ASR model, that is a global-based counterpart of the corresponding on-device ASR models, to be updated in a decentralized manner and utilizing the subset of the plurality of client devices that provided the given corresponding candidate correction pair.