US 12,321,781 B2
	Task scheduling for machine-learning workloads
Jue Wang, Redmond, WA (US); and Hui Huang, Palo Alto, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 29, 2022, as Appl. No. 18/091,263.
Application 18/091,263 is a continuation of application No. 16/720,717, filed on Dec. 19, 2019, granted, now 11,544,113.
Claims priority of provisional application 62/938,304, filed on Nov. 20, 2019.
Prior Publication US 2023/0136661 A1, May 4, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/50 (2006.01); G06N 3/04 (2023.01); G06N 20/00 (2019.01); G06T 1/20 (2006.01)

CPC G06F 9/5011 (2013.01) [G06N 3/04 (2013.01); G06N 20/00 (2019.01); G06F 2212/2542 (2013.01); G06T 1/20 (2013.01)]

18 Claims

1. A method for scheduling tasks and allocating resources to perform a machine-learning (“ML”) workload using hardware accelerators that are each configured to implement a neural network comprising a plurality of neural network layers, the method comprising:

determining, based on a request to perform the ML workload, a resource requirement to perform the ML workload using a plurality of hosts;

generating, by a controller, a protocol bit indicating non-uniform memory access (NUMA) locality required for at least one task of the ML workload;

for each host of the plurality of hosts:

assigning, based on the protocol bit and a NUMA topology that includes NUMA nodes within the host, a task to be executed at the host using a plurality of hardware accelerators of the host; and

performing the ML workload by executing the task assigned to the host,

wherein the NUMA nodes include memory that is local to the host, the memory having a socket interface that couples the memory to each hardware accelerator of the plurality of hardware accelerators of the host and to a resource of the host, and wherein at least one NUMA topology specifies:

i) for a first host of the plurality of hosts, a first NUMA node that includes a first memory in a configuration of resources that is local to the first host, and

ii) a second, different memory in a configuration of resources that is local to a second, different host that is remote to the first NUMA node of the first host.