US 12,260,311 B2
	Neural networks with pre-normalized layers or regularization normalization layers
Jascha Narain Sohl-Dickstein, San Francisco, CA (US); and Vinay Srinivas Rao, Palo Alto, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Oct. 4, 2021, as Appl. No. 17/493,790.
Claims priority of provisional application 63/087,112, filed on Oct. 2, 2020.
Prior Publication US 2022/0108149 A1, Apr. 7, 2022
Int. Cl. G06N 3/04 (2023.01); G06N 3/08 (2023.01)

CPC G06N 3/04 (2013.01) [G06N 3/08 (2013.01)]

21 Claims

11. A method performed by one or more computers, the method comprising:

receiving a network input; and

processing the network input using a neural network to generate a network output for the network input, wherein the neural network is configured to process the network input through a plurality of neural network layers to generate the network output, the plurality of neural network layers comprising a pre-normalized layer that is configured to, during processing of the network input by the neural network, perform operations comprising:

receiving a layer input comprising a plurality of first elements; and

generating a layer output from the layer input, comprising:

computing a mean of the plurality of first elements in the layer input;

generating a shifted layer input by subtracting, from each first element in the layer input, the mean of the plurality of first elements;

applying an affine operation to the shifted layer input in accordance with a set of weights of the pre-normalized layer to generate a transformed layer input having a plurality of transformed elements;

computing a standard deviation of the transformed elements; and

generating a scaled layer input by dividing each transformed element by the standard deviation.