US 12,353,908 B2
	Scheduler for planet-scale computing system
Muthian Sivathanu, Chennai (IN); Atul Katiyar, Sammamish, WA (US); Dharma Kiritkumar Shukla, Bellevue, WA (US); Rimma Vladimirovna Nehme, Bellevue, WA (US); Shreshth Singhal, Seattle, WA (US); Pankaj Sharma, Redmond, WA (US); Nipun Kwatra, Bangalore (IN); and Ramachandran Ramjee, Bengaluru (IN)
Assigned to Microsoft Technology Licensing, LLC., Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jun. 28, 2021, as Appl. No. 17/361,224.
Claims priority of application No. 202141014650 (IN), filed on Mar. 30, 2021.
Prior Publication US 2022/0318052 A1, Oct. 6, 2022
Int. Cl. G06F 9/48 (2006.01); G06F 9/50 (2006.01)

CPC G06F 9/4881 (2013.01) [G06F 9/5038 (2013.01); G06F 9/5088 (2013.01)]

15 Claims

1. A system for scheduling execution of artificial intelligence (AI) workloads in a cloud infrastructure platform, the system comprising:

at least one processor of the cloud infrastructure platform, wherein the at least one processor includes at least one of: a central processing unit (CPU), a graphics processing unit (GPU), and a hardware accelerator; and

at least one memory of the cloud infrastructure platform comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the at least one processor to:

schedule, by the scheduler, a first AI workload to a first node of the cloud infrastructure platform, the first AI workload being associated with a first priority tier indicative of a preemption priority with which the first AI workload is associated while being executed, wherein the first node includes infrastructure resources for use in executing AI workloads, and wherein the first AI workload is distributed to the first node based the first priority tier of the first AI workload, wherein the first AI workload requires AI accelerator hardware to execute, the infrastructure resources including the accelerator hardware;

based at least on scheduling the first AI workload, execute, the first AI workload on the infrastructure resources of the first node; and

schedule, by the scheduler, a multi-node AI workload associated with a second priority tier to a plurality of nodes, the multi-node AI workload requiring AI accelerator hardware to execute, the plurality of nodes including the first node, each of the plurality of nodes including infrastructure resources including AI accelerator hardware for use in executing AI workloads, wherein the scheduling of the multi-node AI workload includes preempting the first AI workload associated with the first priority tier that is lower priority than the second priority tier, the preempting freeing capacity of the AI accelerator hardware for use by the multi-node AI workload.