US 11,836,577 B2
	Reinforcement learning model training through simulation
Sunil Mallya Kasaragod, San Francisco, CA (US); Sahika Genc, Mercer Island, WA (US); Leo Parker Dirac, Seattle, WA (US); Bharathan Balaji, Seattle, WA (US); Eric Li Sun, Milpitas, CA (US); and Marthinus Coenraad De Clercq Wentzel, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Nov. 27, 2018, as Appl. No. 16/201,830.
Prior Publication US 2020/0167686 A1, May 28, 2020
Int. Cl. G06F 30/27 (2020.01); G06N 20/00 (2019.01); B25J 9/16 (2006.01); G06F 30/20 (2020.01); G06F 7/02 (2006.01); G06N 5/022 (2023.01)

CPC G06N 20/00 (2019.01) [B25J 9/163 (2013.01); B25J 9/1605 (2013.01); B25J 9/1671 (2013.01); G06F 7/023 (2013.01); G06F 30/20 (2020.01); G06F 30/27 (2020.01); G06N 5/022 (2013.01)]

20 Claims

1. A computer-implemented method, comprising:

receiving, from a customer of a simulation management service, first computer-executable code defining a custom-designed reinforcement function for training a reinforcement learning model for a system;

evaluating, by the simulation management service, the first computer-implemented executable code to identify one or more suggestions to modify the first computer-implemented executable code;

modifying, by the simulation management service, the first computer-implemented executable code based at least on the one or more suggestions to generate a second computer-implemented executable code defining the custom-designed reinforcement function, the one or more suggestions determined based at least in part on prior computer-executable code for one or more other reinforcement functions;

storing the second computer-executable code in association with an identifier of the custom-designed reinforcement function;

receiving a request to perform reinforcement learning for the system using a simulation application, the request specifying the identifier;

generating a simulation environment by at least using the identifier to obtain the second computer-executable code and injecting the second computer-executable code into the simulation application; and

performing the reinforcement learning using the simulation environment.