US 12,307,362 B2
Batch processing in a machine learning computer
Godfrey Da Costa, Bristol (GB); and Carlo Luschi, Oxford (GB)
Assigned to GRAPHCORE LIMITED, Bristol (GB)
Filed by Graphcore Limited, Bristol (GB)
Filed on Jun. 30, 2021, as Appl. No. 17/363,379.
Claims priority of application No. 2017262 (GB), filed on Oct. 30, 2020.
Prior Publication US 2022/0138564 A1, May 5, 2022
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method of processing batches of data in a computer comprising a plurality of pipelined processing stages each providing one or more layers of a machine learning model, the method comprising:
processing a first batch of data in the pipelined processing stages in a forward pass, each layer using an activation function and weights for that layer to generate an output activation, wherein the output activation provides an input for a next layer of the model, and wherein an output layer generates an output of the model;
comparing the output of the model with a target value in accordance with a loss function to provide an estimate of a gradient;
processing the estimate of the gradient in a backwards pass of the model, each layer computing an estimate of the gradient for that layer with respect to the weights for that layer;
generating updated weights in each layer, the updated weights resulting from processing the estimate of the gradient with respect to the weights for the first batch at that layer in the backwards pass, using a learning rate for the model; and
storing the updated weights of that layer for successive processing on the next batch of data;
wherein the step of generating the updated weights in each layer comprises using a modulation factor computed based on a number of the processing stages between that layer and the output layer, wherein the modulation factor is a function of a square root of a function of n+1, where n is proportional to the number of the processing stages between a present layer and the output layer.