US 12,073,823 B2
	Asynchronous optimization for sequence training of neural networks
Georg Heigold, Mountain View, CA (US); Erik Mcdermott, San Francisco, CA (US); Vincent O. Vanhoucke, San Francisco, CA (US); Andrew W. Senior, New York, NY (US); and Michiel A. U. Bacchiani, Summit, NJ (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Nov. 10, 2023, as Appl. No. 18/506,540.
Application 18/506,540 is a continuation of application No. 18/069,035, filed on Dec. 20, 2022, granted, now 11,854,534.
Application 18/069,035 is a continuation of application No. 17/644,362, filed on Dec. 15, 2021, granted, now 11,557,277, issued on Jan. 17, 2023.
Application 17/644,362 is a continuation of application No. 17/143,140, filed on Jan. 6, 2021, granted, now 11,227,582, issued on Jan. 18, 2022.
Application 17/143,140 is a continuation of application No. 16/863,432, filed on Apr. 30, 2020, granted, now 10,916,238, issued on Feb. 9, 2021.
Application 16/863,432 is a continuation of application No. 16/573,323, filed on Sep. 17, 2019, granted, now 10,672,384, issued on Jun. 2, 2020.
Application 16/573,323 is a continuation of application No. 15/910,720, filed on Mar. 2, 2018, granted, now 10,482,873, issued on Nov. 19, 2019.
Application 15/910,720 is a continuation of application No. 14/258,139, filed on Apr. 22, 2014, granted, now 10,019,985, issued on Jul. 10, 2018.
Claims priority of provisional application 61/899,466, filed on Nov. 4, 2013.
Prior Publication US 2024/0087559 A1, Mar. 14, 2024
Int. Cl. G10L 15/06 (2013.01); G06N 3/045 (2023.01); G10L 15/16 (2006.01); G10L 15/183 (2013.01)

CPC G10L 15/063 (2013.01) [G06N 3/045 (2023.01); G10L 15/16 (2013.01); G10L 15/183 (2013.01)]

20 Claims

1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising:

receiving a current set of parameter values for a neural network model executable on the data processing hardware;

obtaining training utterances each comprising one or more predetermined words spoken by a speaker;

processing the training utterances using the current set of parameter values for the neural network model;

determining updated parameter values for the neural network model based on the processing of the training utterances; and

performing stochastic gradient descent optimization to determine corresponding model parameter gradients for the neural network model based on differences between the current set of parameter values and the updated parameter values for the neural network model.