US 12,189,717 B1
	Automatic partitioning of machine learning models for training across multiple devices
Can Karakus, Redwood City, CA (US); Rahul Raghavendra Huilgol, San Jose, CA (US); Anirudh Subramanian, Redwood City, CA (US); Fei Wu, Sunnyvale, CA (US); Christopher Cade Daniel, San Jose, CA (US); Akhil Mehra, Saratoga, CA (US); Ajay Paidi, Newark, CA (US); Yutong Zhang, Belmont, CA (US); Indu Thangakrishnan, Santa Clara, CA (US); and Luis Alves Pereira Quintela, Mountain View, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Nov. 27, 2020, as Appl. No. 17/105,998.
Int. Cl. G06F 18/21 (2023.01); G06F 9/48 (2006.01); G06F 9/50 (2006.01); G06F 9/54 (2006.01); G06F 18/211 (2023.01); G06N 20/00 (2019.01)

CPC G06F 18/2163 (2023.01) [G06F 9/4881 (2013.01); G06F 9/5066 (2013.01); G06F 9/54 (2013.01); G06F 18/211 (2023.01); G06N 20/00 (2019.01); G06F 2209/5017 (2013.01)]

17 Claims

1. A system for automatic partitioning of machine learning models and parallel execution management for training across a plurality of devices for different machine learning frameworks, the system comprising:

at least one processor; and

a memory, storing program instructions that when executed by the at least one processor, cause the at least one processor to:

receive a training job for a machine learning model that includes a request for automatic partitioning of the machine learning model across the plurality of devices, wherein the training job is a code file or a script;

evaluate the request to determine one feature in an optimization parameter specified in the request for automatic partitioning, wherein the optimization parameter configures application of a partitioning technique applied to determine the different respective partitions in order to optimize one feature specified in the optimization parameter out of a plurality features that can be optimized;

determine different respective partitions of the machine learning model based, at least in part, on a number of partitions and the optimization parameter, and wherein to determine the different respective partitions of the machine learning model, the program instructions cause the at least one processor to:

execute a first training run to construct a version of the machine learning model on a computer processing unit (CPU) memory; and

apply a selection of a tree-based partitioning algorithm or a graph-based partitioning algorithm using the constructed version of the machine learning model;

generate a schedule for executing the training job across the plurality of processing devices according to the different respective partitions of the machine learning model; and

cause the training job to be executed according to the schedule.