US 11,676,021 B1
	Multi-model training pipeline in distributed systems
Patricio Kaplan, Palo Alto, CA (US); and Ron Diamant, Santa Clara, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Sep. 19, 2022, as Appl. No. 17/947,355.
Application 17/947,355 is a continuation of application No. 16/835,161, filed on Mar. 30, 2020, granted, now 11,468,325.
Int. Cl. G06N 3/08 (2023.01); G06N 3/045 (2023.01)

CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01)]

20 Claims

1. A method comprising:

receiving a first neural network model, a first set of weights associated with the first neural network model, a second neural network model, and a second set of weights associated with the second neural network model, wherein the second neural network model is different from the first neural network model;

determining an order between performing computations for the first neural network model and performing computations for the second neural network model at a first worker node and a second worker node of a distributed system;

computing, at the first worker node, a first set of gradients using the first neural network model and the first set of weights and a second set of gradients using the second neural network model and the second set of weights in accordance with the determined order; and

computing, at the second worker node, a first set of synchronized gradients based on the first set of gradients and a second set of synchronized gradients based on the second set of gradients in accordance with the determined order.