CPC G05D 1/0214 (2013.01) [G05D 1/0221 (2013.01); G05D 1/0223 (2013.01); G05D 1/0238 (2013.01)] | 12 Claims |
1. A mobile object control device comprising:
a storage device configured to store a program; and
a hardware processor, wherein
the hardware processor executes the program stored in the storage device to,
determine a route of the mobile object according to number of obstacles existing around the mobile object, and
move the mobile object along the determined route, wherein
the hardware processor
determines a route of the mobile object based on a policy of an operation learned by a plurality of simulators and a learning part; and
the policy of the operation is
learned by the plurality of simulators simultaneously executing a simulation in parallel, and
the learning part updating parameters of the policy so as to maximize a reward obtained by applying a reward function to each result of the simulation of the plurality of the simulators,
wherein each of the plurality of the simulators is executing the simulation of an operation of the mobile object and the obstacles for a corresponding environment among a plurality of environments, each environment having different numbers of obstacles as compared to one another.
|