US 12,229,602 B2
Memory-aware request placement for virtual GPU enabled systems
Anshuj Garg, Jabalpur (IN); Uday Pundalik Kurkure, Los Altos Hills, CA (US); Hari Sivaraman, Livermore, CA (US); and Lan Vu, Palo Alto, CA (US)
Assigned to VMware LLC, Palo Alto, CA (US)
Filed by VMware LLC, Palo Alto, CA (US)
Filed on Apr. 29, 2022, as Appl. No. 17/733,284.
Application 17/733,284 is a continuation of application No. 16/550,313, filed on Aug. 26, 2019, granted, now 11,372,683.
Prior Publication US 2022/0253341 A1, Aug. 11, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/46 (2006.01); G06F 9/455 (2018.01); G06F 9/50 (2006.01)
CPC G06F 9/5044 (2013.01) [G06F 9/45558 (2013.01); G06F 9/5016 (2013.01); G06F 2009/45579 (2013.01); G06F 2009/45583 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A non-transitory computer-readable medium comprising machine readable instructions, wherein the instructions, when executed by at least one processor, cause at least one computing device to perform operations comprising:
executing a scheduling service in a computing environment comprising one or more host computers, each of the one or more host computers having a virtualization layer that provides virtualized hardware for one or more virtualized computing instances (VCI);
identifying, by the scheduling service, a plurality of graphics processing units (GPUs) in a computing environment, wherein each of the plurality of GPUs is configured with a virtual GPU (vGPU) profile comprising a memory reservation that represents a maximum GPU memory requirement that the respective GPU will support with that respective configured vGPU profile;
sorting, by the scheduling service, a first list of the plurality of configured GPUs in increasing order of the memory requirement of the vGPU profile of each configured GPU;
receiving, by the scheduling service, a plurality of graphics processing requests, each respective graphics processing request comprising a GPU memory requirement:
sorting, by the scheduling service, a second list of the plurality of graphics processing requests according to a vGPU request placement model of a memory requirement of each respective graphics processing request;
determining, by the scheduling service and with the vGPU request-placement model that considers the respective GPU memory requirement of each graphics processing request and the respective memory reservation of the respective vGPU profile of each configured GPU, that a first configured GPU in the sorted first list has a memory reservation that meets a memory requirement of a first memory request in the sorted second list; and
assigning, based on a determination that the first configured GPU in the sorted first list has a memory reservation that meets a memory requirement of the first memory request in the sorted second list, the first memory request to the first configured GPU.