US 12,112,259 B1
Dynamic environment configurations for machine learning services
Leo Parker Dirac, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Nov. 21, 2018, as Appl. No. 16/198,733.
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G06F 30/20 (2020.01)
CPC G06N 3/08 (2013.01) [G06F 30/20 (2020.01)] 19 Claims
OG exemplary drawing
 
1. A system for generating trained reinforcement learning models, the system comprising:
a plurality of processing devices corresponding to a simulation environment for a set of simulated, hosted customer networks; and
a non-transitory computer readable medium storing instructions that, when executed by the at least a first processing device, cause the system to perform operations including:
receiving, from a client device, a request to provide a generated trained reinforcement learning model on training data generated from an identified first simulated, hosted customer network;
responsive to the request from the client device, selecting a first simulated, hosted customer network from a set of simulated, hosted customer networks using the received request as selection criteria;
identifying, from a data store, a simulator for generating the training data corresponding to a network simulation of an identified network and for training a reinforcement learning model based at least in part on the request;
identifying a reference reinforcement learning model for the request;
instantiating a simulation environment based at least in part on a parameter from an environment used to train the reference reinforcement learning model;
instantiating an agent in the simulation environment including the simulator, the agent being configured using at least one parameter used to train the reference reinforcement learning model;
activating the agent for a simulation period to form the trained reinforcement learning model, wherein each activation includes a state and a reward for a previous action taken by the agent; and
providing to a client device the generated trained reinforcement learning model, of a simulated reinforcement learning model of the training data, in response to the request.