| CPC G06N 3/04 (2013.01) [G06N 3/08 (2013.01)] | 21 Claims |

|
11. A method performed by one or more computers, the method comprising:
receiving a network input; and
processing the network input using a neural network to generate a network output for the network input, wherein the neural network is configured to process the network input through a plurality of neural network layers to generate the network output, the plurality of neural network layers comprising a pre-normalized layer that is configured to, during processing of the network input by the neural network, perform operations comprising:
receiving a layer input comprising a plurality of first elements; and
generating a layer output from the layer input, comprising:
computing a mean of the plurality of first elements in the layer input;
generating a shifted layer input by subtracting, from each first element in the layer input, the mean of the plurality of first elements;
applying an affine operation to the shifted layer input in accordance with a set of weights of the pre-normalized layer to generate a transformed layer input having a plurality of transformed elements;
computing a standard deviation of the transformed elements; and
generating a scaled layer input by dividing each transformed element by the standard deviation.
|