| CPC G06N 20/00 (2019.01) [B25J 9/1605 (2013.01); B25J 9/163 (2013.01); B25J 9/1671 (2013.01); G06F 30/20 (2020.01); G06F 30/27 (2020.01); G05B 2219/33056 (2013.01); G06N 3/006 (2013.01)] | 21 Claims |

|
6. A first system, comprising:
one or more processors; and
memory that stores computer-executable instructions that, if executed, cause the first system to:
execute a simulation of a robotic device in a simulation environment, the simulation comprising an agent representing a second system using a reinforcement learning model to operate within the simulation environment;
obtain data indicating how the agent performed in the simulation environment;
transmit the data to another system to cause the other system to:
run a model training application;
use the data generated by the agent to update the reinforcement learning model to produce an updated reinforcement learning model; and
provide the updated reinforcement learning model to the agent at the simulation environment;
obtain, by the agent, the updated reinforcement learning models;
execute the simulation of the system according to the updated reinforcement learning model;
obtain a notification from the other system that indicates that a termination requirement for the simulation has been satisfied; and
make available the updated reinforcement learning model for optimizing an application of the second system in response to the notification.
|