| CPC G10L 25/51 (2013.01) [G10L 15/005 (2013.01); G10L 15/18 (2013.01)] | 20 Claims |

|
1. A method implemented by one or more processors of a client device, the method comprising:
receiving, from a given radio station, a stream of audio data that captures a stream of spoken utterances in a given language;
generating, based on processing the stream of audio data, an audio-fingerprint for the stream of audio data;
determining, based on comparing the audio-fingerprint for the stream of audio data to a database of audio-fingerprints, whether the stream of audio data has been previously utilized in generating a gradient for updating a global machine learning (ML) model with respect to the given language; and
in response to determining that the stream of audio data has not been previously utilized in generating a gradient for updating the global ML model with respect to the given language:
processing, using an on-device ML model that is stored in on-device storage of the client device and that is an on-device counterpart of the global ML model, the stream of audio data;
generating, using an unsupervised or self-supervised learning technique, and based on processing the stream of audio data using the on-device ML model, the gradient; and
transmitting the gradient to the remote system to be utilized in updating the global ML model with respect to the given language.
|