US 11,941,528 B2
Neural network training in a distributed system
Vignesh Vivekraja, Santa Clara, CA (US); Thiam Khean Hah, Milpitas, CA (US); Randy Renfu Huang, Morgan Hill, CA (US); Ron Diamant, Santa Clara, CA (US); and Richard John Heaton, San Jose, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Sep. 30, 2019, as Appl. No. 16/588,603.
Prior Publication US 2021/0097396 A1, Apr. 1, 2021
Int. Cl. G06N 3/084 (2023.01); G06N 3/063 (2023.01); G06N 3/045 (2023.01); G06N 3/10 (2006.01)
CPC G06N 3/084 (2013.01) [G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/10 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method of training a neural network model in a distributed system, the distributed system comprising a first worker node and a second worker node, the neural network model comprising a first neural network layer and a second neural network layer, the method being performed by the first worker node and comprising:
performing, by the first worker node, backward propagation computations for the second neural network layer to generate second layer data gradients and second layer weight gradients;
generating, by the first worker node, a first plurality of exchange tasks each corresponding to an exchange of a portion of the second layer weight gradients with the second worker node;
executing a first exchange task of the first plurality of exchange tasks to exchange a first portion of the second layer weight gradients with the second worker node;
performing, by the first worker node, backward propagation computations for the first neural network layer based on the second layer data gradients to generate first layer data gradients and first layer weight gradients;
generating, by the first worker node, a second plurality of exchange tasks each corresponding to an exchange of a portion of the first layer weight gradients with the second worker node;
after the execution of the first exchange task completes and before executing remaining exchange tasks of the first plurality of exchange tasks, executing all of the second plurality of exchange tasks to exchange the first layer weight gradients with the second worker node, such that the second plurality of exchange tasks are prioritized over the first plurality of exchange tasks;
updating weights for the first neural network layer based on the exchanged first layer weight gradients;
performing, by the first worker node, forward propagation computations for the first neural network layer based on the updated weights;
executing the remaining exchange tasks of the first plurality of exchange tasks to exchange the remaining portions of the second layer weight gradients with the second worker node; and
updating weights for the second neural network layer based on the exchanged second layer weight gradients.