US 11,995,466 B1
	Scaling down computing resource allocations for execution of containerized applications
Archana Srikanta, Bellevue, WA (US); Onur Filiz, Redmond, WA (US); Prashant Prahlad, Seattle, WA (US); Amit Gupta, Bellevue, WA (US); and Song Hu, Sammamish, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 30, 2021, as Appl. No. 17/305,145.
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/50 (2006.01); G06F 9/455 (2018.01); G06F 9/48 (2006.01); H04L 67/63 (2022.01); G06F 8/60 (2018.01)

CPC G06F 9/5005 (2013.01) [G06F 9/455 (2013.01); G06F 9/45558 (2013.01); G06F 9/48 (2013.01); G06F 9/4806 (2013.01); G06F 9/4843 (2013.01); G06F 9/485 (2013.01); G06F 9/4881 (2013.01); G06F 9/50 (2013.01); G06F 9/5011 (2013.01); G06F 9/5022 (2013.01); G06F 9/5027 (2013.01); G06F 9/505 (2013.01); G06F 9/5061 (2013.01); G06F 9/5072 (2013.01); G06F 9/5077 (2013.01); G06F 9/5083 (2013.01); H04L 67/63 (2022.05); G06F 8/60 (2013.01)]

20 Claims

1. A cloud provider system comprising:

a serverless container management service configured to acquire compute capacity and execute a task using the acquired compute capacity in response to a request to execute the task; and

an application execution service in networked communication with the serverless container management service,

wherein the application execution service is configured to at least:

receive a request to deploy a Web application configured to handle a plurality of HTTP requests;

transmit, to the serverless container management service, a task execution request to execute a set of tasks to be used to implement the Web application, wherein the set of tasks includes at least a first task and a second task that are each configured to handle a subset of the plurality of HTTP requests directed to the Web application, and wherein the task execution request indicates that each of the set of tasks is to be allocated a first amount of computing resources; and

based at least in part on the first task associated with the Web application being at a first priority level and the second task associated with the Web application being at a second priority level lower than the first priority level, route a set of HTTP requests directed to the Web application to the first task, and

wherein the serverless container management service is further configured to at least:

determine that the first task has finished handling the set of HTTP requests routed to the first task and no other HTTP requests are waiting to be handled by the first task;

cause the first task to be transitioned from (i) an active mode in which the first amount of computing resources is allocated to the first task and the first task is configured to handle HTTP requests using the first amount of computing resources, to (ii) a standby mode in which a second amount of computing resources greater than zero and less than the first amount of computing resources is allocated to the first task and the first task is configured to handle HTTP requests using the second amount of computing resources; and

subsequent to the transitioning of the first task from the active mode to the standby mode, cause the first task to be re-transitioned from (a) being configured to handle HTTP requests using the second amount of computing resources in the standby mode to (b) being configured to handle HTTP requests using an amount of computing resources that is greater than the second amount of computing resources in the active mode.