CPC G10L 15/197 (2013.01) [G10L 15/063 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 15/00 (2013.01); G10L 2015/0635 (2013.01)] | 20 Claims |
1. A method implemented by one or more processors of a remote system, the method comprising:
receiving, from a plurality of client devices, corresponding candidate correction pairs, each of the corresponding candidate correction pairs including:
a corresponding portion of a corresponding predicted textual segment that was generated based on processing corresponding audio data locally at a corresponding one of the plurality of client devices and using a corresponding on-device automatic speech recognition (ASR) model, and
a corresponding alternate textual segment that was generated locally at the corresponding one of the plurality of client devices and based on a corresponding modification to the corresponding portion of the corresponding predicted textual segment that resulted in the corresponding alternate textual segment;
determining whether a given corresponding candidate correction pair, of the corresponding candidate correction pairs, is a corresponding actual correction pair based a threshold quantity of occurrences of the given corresponding candidate correction pair being received from one or more of the plurality of client devices; and
in response to determining that the given corresponding candidate correction pair is a corresponding actual correction pair:
identifying, from among the plurality of client devices that provided the given corresponding candidate correction pair, a subset of the plurality of client devices that provided the given corresponding candidate correction pair; and
causing a global ASR model, that is a global-based counterpart of the corresponding on-device ASR models, to be updated in a decentralized manner and utilizing the subset of the plurality of client devices that provided the given corresponding candidate correction pair.
|