US 12,306,738 B1
Local runtime replication for machine learning model training on a provider network
Zhankui Lu, Redmond, WA (US); Manoj Ravi, Mountain View, CA (US); Uday Kumar Bandaru, Sammamish, WA (US); Arun Babu Nagarajan, Redmond, WA (US); Dipankar Patro, Bothell, WA (US); Khushboo Srivastava, Walnut Creek, CA (US); Brian Ellison Granger, San Luis Obispo, CA (US); and Weixun Wang, Kirkland, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 30, 2023, as Appl. No. 18/345,330.
Int. Cl. G06F 11/3604 (2025.01); G06F 8/41 (2018.01)
CPC G06F 11/3612 (2013.01) [G06F 8/433 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
in response to a call in a machine learning code to utilize a model training system of a provider network to execute a function, in a software developer kit (SDK) installed on a local machine:
generating local runtime environment information for a virtualized, local environment and providing the virtualized, local runtime environment information to the model training system of the provider network, wherein the virtualized, local runtime environment, wherein the virtualized, local runtime environment information is containerized,
serializing a function and arguments and providing the serialized function and arguments to the model training system of the provider network,
generating a remote job request based at least in part on local runtime environment information,
causing the model training system of the provider network to execute the function remotely according to the remote job request using a replication of the virtualized, local environment; and
receiving a result of the execution of the function, wherein the call includes at least one of:
an indication of one or more dependencies to install,
one or more pre-execution commands,
one or more pre-execution scripts,
one or more environment variables,
an indication of if the remote function should include local directories,
a prefix to be used to create an underlying remote job request,
an indication of a time limit to retain provisioned infrastructure,
an indication of a session to use for service calls; a listing of security requirements,
a listing of subnet identifiers,
a list of tags to attach to the job,
an indication of whether traffic between remote compute containers is to be encrypted during remote compute, or
an indication of whether a container will be isolated.