US 12,033,067 B2
	Quantizing neural networks with batch normalization
Suharsh Vikram Sivakumar, San Francisco, CA (US); and Raghuraman Krishnamoorthi, San Ramon, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Jan. 30, 2019, as Appl. No. 16/262,772.
Claims priority of provisional application 62/753,595, filed on Oct. 31, 2018.
Prior Publication US 2020/0134448 A1, Apr. 30, 2020
Int. Cl. G06N 3/08 (2023.01); G06F 7/483 (2006.01); G06N 3/04 (2023.01)

CPC G06N 3/08 (2013.01) [G06F 7/483 (2013.01); G06N 3/04 (2013.01)]

20 Claims

1. A method of training a neural network that has a batch normalized first neural network layer, the method comprising:

during the training, maintaining (i) moving averages of batch normalization statistics for the batch normalized first neural network layer and (ii) floating point weights for the batch normalized first neural network layer;

receiving a first batch of training data;

processing the first batch of training data through one or more layers of the neural network to generate a batch of layer inputs to the batch normalized first neural network layer;

determining batch normalization statistics for the first batch of training data using the floating point weights, comprising:

performing operations of the batch normalized first neural network layer on the batch of layer inputs using the floating point weights to generate a batch of initial layer outputs; and

computing batch normalization statistics based on the batch of initial layer outputs;

determining a correction factor from the batch normalization statistics for the first batch of training data and the moving averages of the batch normalization statistics;

generating batch normalized weights from the floating point weights for the batch normalized first neural network layer using the correction factor, comprising applying the correction factor to the floating point weights of the batch normalized first neural network layer;

quantizing the batch normalized weights to generate quantized batch normalized weights;

after quantizing the batch normalized weights, performing operations of the batch normalized first neural network layer on the batch of layer inputs using the quantized batch normalized weights to generate a batch of layer outputs, comprising:

applying the quantized batch normalized weights to each of the batch of layer inputs to generate a respective initial output for each layer input in the batch of layer inputs;

determining whether the training of the neural network has reached a first threshold;

in response to determining that the training of the neural network has reached the first threshold, switching to using a long-term moving average of the batch normalization statistics by refraining from dividing each respective initial output by the correction factor to introduce a dependence on the batch normalization statistics;

determining, using the batch of layer outputs, a gradient of an objective function with respect to the quantized batch normalized weights for the first neural network layer; and

updating the floating point weights of the first neural network layer using the gradient with respect to the quantized batch normalized weights.