US 12,014,724 B2
	Unsupervised federated learning of machine learning model layers
Françoise Beaufays, Mountain View, CA (US); Khe Chai Sim, Dublin, CA (US); and Johan Schalkwyk, Mountain View, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 16/973,605
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Jul. 20, 2020, PCT No. PCT/US2020/042806 § 371(c)(1), (2) Date Dec. 9, 2020, PCT Pub. No. WO2022/019885, PCT Pub. Date Jan. 27, 2022.
Prior Publication US 2022/0270590 A1, Aug. 25, 2022
Int. Cl. G10L 15/06 (2013.01); G10L 15/187 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)

CPC G10L 15/063 (2013.01) [G10L 15/187 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 2015/0635 (2013.01)]

20 Claims

1. A method performed by one or more processors of a client device, the method comprising:

detecting, via one or more microphones of the client device, audio data that captures at least part of a spoken utterance of a user of the client device;

processing, using a local machine learning model stored locally on the client device, the audio data to generate predicted output,

wherein the local machine learning model includes at least a first set of local machine learning model layers and a second set of machine learning model layers,

wherein the first set of local machine learning model layers is used in generating, based on processing the audio data, an encoding of the audio data, and

wherein the second set of local machine learning model layers is used in generating, based on processing the encoding of the audio data that is generated using the first set of local machine learning model layers, the predicted output;

generating, using unsupervised learning, a gradient based on the predicted output; and

transmitting, to a remote system and from the client device, the generated gradient to cause the remote system to utilize the generated gradient to update weights of global machine learning model layers that are stored remotely at the remote system and that correspond structurally to the first set of local machine learning model layers, of the local machine learning model, used in generating the encoding of the audio data, and

subsequent to the remote system updating the weights of the global machine learning model layers utilizing the generated gradient received from the client device and additional gradients received from additional client devices:

receiving, at the client device and from the remote system, a combined machine learning model that includes the updated global machine learning model layers and one or more additional layers; and

using the combined machine learning model to make at least one prediction based on further audio data, detected via one or more of the microphones of the client device, that captures at least part of a further spoken utterance of the user of the client device.