US 12,229,600 B1
	Resource management techniques to reduce startup overhead for machine learning tasks
Ramyanshu Datta, Campbell, CA (US); Zhihan Li, Seattle, WA (US); Arun Babu Nagarajan, Redmond, WA (US); Arvind Sowmyan, Seattle, WA (US); Kohen Berith Chia, Seattle, WA (US); Wei You, Kirkland, WA (US); Ishaaq Chandy, Bellevue, WA (US); Kunal Mehrotra, Kirkland, WA (US); Andrea Olgiati, Gilroy, CA (US); Lakshmi Naarayanan Ramakrishnan, Redmond, WA (US); and Saurabh Gupta, Sammamish, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Sep. 22, 2021, as Appl. No. 17/482,272.
Int. Cl. G06F 9/50 (2006.01)

CPC G06F 9/5033 (2013.01) [G06F 9/5022 (2013.01); G06F 9/5038 (2013.01); G06F 9/5077 (2013.01)]

20 Claims

1. A system, comprising:

one or more computing devices;

wherein the one or more computing devices include instructions that upon execution on or across the one or more computing devices cause the one or more computing devices to:

store, at a machine learning service of a provider network, in response to a pool establishment request, a first set of parameters of a first pool of compute instances to be assigned for machine learning tasks requested by one or more entities of a first set of entities, wherein the first set of parameters includes (a) a maximum number of compute instances of a first category of compute instances of a computing service of the provider network which are to be included in the first pool, wherein the computing service provides compute instances of a plurality of categories, wherein the first category differs from other categories of the plurality of categories in at least one performance capacity, and (b) a post-task-completion retention period during which, after completion of a machine learning task at a compute instance of the first category within the first pool, a data set accessed by the completed machine learning task at the compute instance is to be retained at the compute instance by the machine learning service;

include, for a first compute instance of the first category that has completed a task and based on verification that a total number of in-use compute instances of the first category in the first pool is less than the maximum number, the first compute instance in the first pool;

determine responsive to a first machine learning task indicated in a first task request, from an entity of the first set of entities and that does not specify a compute instance to be used for the first machine learning task,

that (a) one or more networking configuration settings of the first compute instance in the pool satisfy a networking requirement indicated in the first task request, and (b) the post-task-completion retention period of the first compute instance in the pool, relative to a completion of an earlier machine learning task at the first compute instance, has not expired;

assign, by the machine learning service and based on said determine, the first compute instance to the first machine learning task indicated in the first task request such that the first compute instance performs, based on said assign, the first machine learning task; and

cause, by the machine learning service, a result of the first machine learning task to be stored, wherein the result is obtained at the first compute instance.