CPC G06F 9/5027 (2013.01) [G06F 9/45558 (2013.01); G06F 9/547 (2013.01); G06F 2009/4557 (2013.01)] | 20 Claims |
1. A method comprising:
receiving queries at a gateway of a cluster of a container orchestration platform, wherein the cluster includes a host system comprising a first processing resource and a hardware accelerator of the host system, the hardware accelerator comprising a second processing resource, wherein the host system comprising the first processing resource or the hardware accelerator comprising the second processing resource comprises a primary node of the cluster, and wherein the host system and the hardware accelerator comprise a plurality of worker nodes of the cluster;
distributing, at the gateway, the queries among the plurality of worker nodes of the host system and the hardware accelerator based on a queuing model that takes into consideration respective compute capacities of the first processing resource of the host system and the second processing resource of the hardware accelerator;
performing auto-scaling to run a quantity of instances of an application in the plurality of worker nodes; and
responsive to receipt of the queries, directing the queries to the plurality of worker nodes according to the distributing for processing, by the plurality of worker nodes, the quantity of instances of the application.
|