US 11,854,534 B1
	Asynchronous optimization for sequence training of neural networks
Georg Heigold, Mountain View, CA (US); Erik Mcdermott, San Francisco, CA (US); Vincent O. Vanhoucke, San Francisco, CA (US); Andrew W. Senior, New York, NY (US); and Michiel A. U. Bacchiani, Summit, NJ (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 20, 2022, as Appl. No. 18/069,035.
Application 18/069,035 is a continuation of application No. 17/644,362, filed on Dec. 15, 2021, granted, now 11,557,277, issued on Jan. 17, 2023.
Application 17/644,362 is a continuation of application No. 17/143,140, filed on Jan. 6, 2021, granted, now 11,227,582, issued on Jan. 18, 2022.
Application 17/143,140 is a continuation of application No. 16/863,432, filed on Apr. 30, 2020, granted, now 10,916,238, issued on Feb. 9, 2021.
Application 16/863,432 is a continuation of application No. 16/573,323, filed on Sep. 17, 2019, granted, now 10,672,384, issued on Jun. 2, 2020.
Application 16/573,323 is a continuation of application No. 15/910,720, filed on Mar. 2, 2018, granted, now 10,482,873, issued on Nov. 19, 2019.
Application 15/910,720 is a continuation of application No. 14/258,139, filed on Apr. 22, 2014, granted, now 10,019,985, issued on Jul. 10, 2018.
Claims priority of provisional application 61/899,466, filed on Nov. 4, 2013.
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/06 (2013.01); G10L 15/183 (2013.01); G06N 3/045 (2023.01)

CPC G10L 15/063 (2013.01) [G06N 3/045 (2023.01); G10L 15/06 (2013.01); G10L 15/183 (2013.01)]

20 Claims

1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising:

receiving a replica of a neural network model;

obtaining a batch of training utterances each comprising one or more predetermined words spoken by a speaker;

training, by performing stochastic gradient descent optimization on the batch of training utterances, the replica of the neural network model to generate corresponding model parameter gradients for the neural network model; and

sending the corresponding model parameter gradients for the neural network model to a centralized server.