US 11,816,509 B2
	Workload placement for virtual GPU enabled systems
Hari Sivaraman, Livermore, CA (US); Uday Pundalik Kurkure, Los Altos Hills, CA (US); and Lan Vu, Palo Alto, CA (US)
Assigned to VMWARE, INC., Palo Alto, CA (US)
Filed by VMware, Inc., Palo Alto, CA (US)
Filed on Jan. 14, 2020, as Appl. No. 16/742,108.
Prior Publication US 2021/0216375 A1, Jul. 15, 2021
Int. Cl. G06F 9/50 (2006.01); G06F 9/54 (2006.01); G06F 9/30 (2018.01); G06N 3/045 (2023.01)

CPC G06F 9/5077 (2013.01) [G06F 9/30029 (2013.01); G06F 9/5011 (2013.01); G06F 9/5083 (2013.01); G06F 9/546 (2013.01); G06N 3/045 (2023.01)]

20 Claims

1. A system comprising a computing environment comprising a cluster of computing devices that provide host resources comprising a plurality of virtual graphics processing unit (vGPU)-enabled graphic processing units (GPUs) and at least one data store, wherein the at least one data store includes: plurality of workloads, a scheduling service, a simulator, a plurality of vGPU placement models, and a plurality of vGPU placement neural networks, a respective vGPU placement neural network comprising two or more sets of layers, and wherein the at least one data store comprises instructions that when executed by at least one processor of at least one computing device of the cluster, cause the scheduling service to at least:

generate a plurality of configurations for the plurality of vGPU placement neural networks;

identify GPU data and workload data by monitoring workloads executed or simulated on the vGPU-enabled GPUs based at least in part on the plurality of vGPU placement models, wherein the GPU data and the workload data for a respective vGPU placement model are generated based at least in part on placing the workloads in different datacenter configurations comprising a plurality of different arrival rates and a plurality of different GPU counts;

train the plurality of vGPU placement neural networks to maximize a composite efficiency metric of a respective workload based on the GPU data and the workload data;

generate, for at least one candidate workload selected from the workloads, a respective prediction vector using a respective vGPU placement neural network of the plurality of vGPU placement neural networks, wherein the respective vGPU placement neural network is associated with a precision value;

generate a scaled-add combined neural network selector that multiplies the respective prediction vector by the precision value of the respective vGPU placement neural network; and

utilize the scaled-add combined neural network selector to select a particular workload of the at least one candidate workload to execute using at least one of the vGPU-enabled GPUs.