US 12,333,433 B2
	Training neural networks using priority queues
Mohammad Norouzi, Richmond Hill (CA); Daniel Aaron Abolafia, Sunnyvale, CA (US); and Quoc V. Le, Sunnyvale, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Sep. 21, 2023, as Appl. No. 18/471,404.
Application 18/471,404 is a continuation of application No. 16/174,126, filed on Oct. 29, 2018, granted, now 11,797,839.
Claims priority of provisional application 62/578,355, filed on Oct. 27, 2017.
Prior Publication US 2024/0127058 A1, Apr. 18, 2024
Int. Cl. G06N 3/08 (2023.01); G06N 3/044 (2023.01)

CPC G06N 3/08 (2013.01) [G06N 3/044 (2023.01)]

20 Claims

1. A method performed by a system of one or more computers and for training a controller neural network having a plurality of controller parameters to generate output sequences by determining trained values of the controller parameters from initial values of the controller parameters, the method comprising:

maintaining data identifying a set of K output sequences that were previously generated by the controller neural network during the training and, for each output sequence in the set, a respective reward that measures a quality of the output sequence, wherein K is an integer greater than one;

in each of a plurality of iterations, performing:

determining a first update to the current values of the controller parameters using one or more output sequences selected from the set of K output sequences;

generating a batch of new output sequences using the controller neural network in accordance with the current values of the controller parameters;

obtaining a respective reward for each of the new output sequences;

determining, from the new output sequences and the output sequences in the maintained data, the K output sequences that have the highest rewards; and

modifying the maintained data to identify the determined K output sequences and the respective reward for each of the K output sequences.