US 11,886,919 B2
	Directing queries to nodes of a cluster of a container orchestration platform distributed across a host system and a hardware accelerator of the host system
Diman Zad Tootaghaj, Milpitas, CA (US); Anu Mercian, Milpitas, CA (US); Vivek Adarsh, Santa Barbara, CA (US); and Puneet Sharma, Milpitas, CA (US)
Assigned to Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed by Hewlett Packard Enterprise Development LP, Houston, TX (US)
Filed on Jul. 26, 2022, as Appl. No. 17/814,895.
Application 17/814,895 is a continuation of application No. 17/222,160, filed on Apr. 5, 2021, granted, now 11,436,054.
Prior Publication US 2022/0382593 A1, Dec. 1, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/50 (2006.01); G06F 9/455 (2018.01); G06F 9/54 (2006.01)

CPC G06F 9/5027 (2013.01) [G06F 9/45558 (2013.01); G06F 9/547 (2013.01); G06F 2009/4557 (2013.01)]

20 Claims

1. A method comprising:

receiving queries at a gateway of a cluster of a container orchestration platform, wherein the cluster includes a host system comprising a first processing resource and a hardware accelerator of the host system, the hardware accelerator comprising a second processing resource, wherein the host system comprising the first processing resource or the hardware accelerator comprising the second processing resource comprises a primary node of the cluster, and wherein the host system and the hardware accelerator comprise a plurality of worker nodes of the cluster;

distributing, at the gateway, the queries among the plurality of worker nodes of the host system and the hardware accelerator based on a queuing model that takes into consideration respective compute capacities of the first processing resource of the host system and the second processing resource of the hardware accelerator;

performing auto-scaling to run a quantity of instances of an application in the plurality of worker nodes; and

responsive to receipt of the queries, directing the queries to the plurality of worker nodes according to the distributing for processing, by the plurality of worker nodes, the quantity of instances of the application.