CPC G10L 15/065 (2013.01) [G10L 13/04 (2013.01); G10L 15/26 (2013.01); G10L 15/30 (2013.01)] | 20 Claims |
1. A method implemented by one or more processors of a remote system, the method comprising:
receiving a plurality of client gradients from a plurality of corresponding client devices, wherein each of the plurality of client gradients is generated locally at a given one of the plurality of corresponding client devices based on processing corresponding audio data that captures at least part of a corresponding spoken utterance of a corresponding user of the given one of the plurality of corresponding client devices;
generating a plurality of remote gradients, wherein generating each of the plurality of remote gradients comprises:
obtaining additional audio data that captures at least part of an additional spoken utterance of an additional user;
processing, using a global machine learning (ML) model stored remotely at the remote system, the additional audio data to generate predicted output; and
generating an additional gradient, for inclusion in the plurality of remote gradients, based on comparing the additional predicted output to ground truth output corresponding to the additional audio data;
selecting a set of client gradients from among the plurality of client gradients;
selecting an additional set of remote gradients from among the plurality of remote gradients; and
utilizing the set of client gradients and the additional set of remote gradients to update weights of the global ML model.
|