US 11,900,235 B1
	Generating parameter values for recurrent neural networks
Andrew M. Dai, San Francisco, CA (US); Quoc V. Le, Sunnyvale, CA (US); and David Ha, San Francisco, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Sep. 9, 2021, as Appl. No. 17/470,716.
Application 17/470,716 is a continuation of application No. 15/716,330, filed on Sep. 26, 2017, granted, now 11,164,066.
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/04 (2023.01); G06N 3/08 (2023.01)

CPC G06N 3/04 (2013.01) [G06N 3/08 (2013.01)]

20 Claims

1. A method of training a main recurrent neural network comprising one or more recurrent neural network layers and respective hyper recurrent neural networks corresponding to each of the one or more recurrent neural network layers, the method comprising:

processing a sequence of network inputs having a respective network input at each of a plurality of time steps to generate a sequence of network outputs, the processing comprising, for each of the plurality of time steps:

for each of the one or more recurrent neural network layers of the main recurrent neural network:

providing a layer input for the recurrent neural network layer at the time step as input to the respective hyper recurrent neural network corresponding to the recurrent neural network layer, wherein the hyper recurrent neural network is configured to:

generate a current hypernetwork input by combining: (1) the layer input at the time step to the corresponding recurrent neural network layer, (2) a current layer hidden state of the corresponding recurrent neural network layer, and (3) a current hypernetwork hidden state of the hyper recurrent neural network; and

process current hypernetwork input to generate an updated hypernetwork hidden state;

determining, from the updated hypernetwork hidden state for the corresponding hyper recurrent neural network, updated values for a set of layer parameters for the recurrent neural network layer; and

providing the layer input as input to the recurrent neural network layer, wherein the recurrent neural network layer is configured to process the layer input and a current layer hidden state in accordance with the updated values of the set of layer parameters to generate an updated layer hidden state; and

updating parameters of the hyper recurrent neural networks based on the sequence of network outputs through backpropagation.