US 12,140,913 B2
	Control device, unmanned system, control method, and program
Yasuo Fujishima, Tokyo (JP); and Natsuki Matsunami, Tokyo (JP)
Assigned to MITSUBISHI HEAVY INDUSTRIES, LTD., Tokyo (JP)
Appl. No. 16/652,506
Filed by MITSUBISHI HEAVY INDUSTRIES, LTD., Tokyo (JP)
PCT Filed Nov. 2, 2018, PCT No. PCT/JP2018/040865 § 371(c)(1), (2) Date Mar. 31, 2020, PCT Pub. No. WO2019/111608, PCT Pub. Date Jun. 13, 2019.
Claims priority of application No. 2017-236475 (JP), filed on Dec. 8, 2017.
Prior Publication US 2020/0285202 A1, Sep. 10, 2020
Int. Cl. G06N 3/08 (2023.01); G05B 13/02 (2006.01); G06N 3/045 (2023.01); G06N 3/042 (2023.01); G06N 20/00 (2019.01)

CPC G05B 13/027 (2013.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 3/042 (2023.01); G06N 20/00 (2019.01)]

8 Claims

1. A control device comprising:

a non-transitory tangible medium;

a processor configured to execute a program stored in the non-transitory tangible medium to cause the control device to:

acquire, from a sensor, environment information, wherein the environment information comprises a state information of an apparatus and a state of an operation environment of the apparatus, wherein the apparatus is a moving object which is unmanned, and wherein the sensor includes a detection sensor that detects an obstacle and a position sensor that measures a position of the apparatus;

select an action scheme of the apparatus based on a rule base using the state information or the state of the environment;

calculate a parameter for determining content of an action in the action scheme on the basis of any of the environment information inputted to an actor neural network of a learning model;

output a control signal to the apparatus based on the parameter, wherein the control signal causes the apparatus to execute the action;

evaluate, by a critic neural network of the learning model, the action based on a value of the parameter;

update the learning model on the basis of the evaluation, wherein the updating the learning model comprises learning the action of the apparatus executing using the parameter and the evaluation;

update the actor neural network and the critic neural network by a deep deterministic policy gradient (DDPG);

detect a presence of the obstacle based on information detected by the detection sensor;

select an action of avoiding the obstacle, as the action scheme, if the obstacle is detected, and to select an action of causing the apparatus to proceed straight along a route to a destination, as the action scheme, if the obstacle is not detected;

in response to selection of the action of avoiding the obstacle, calculate a parameter indicating an avoidance route of avoiding the obstacle on the basis of the position of the apparatus measured by the position sensor and a distance to the obstacle detected by the detection sensor, wherein the parameter indicating the avoidance route is calculated as a correction amount for an initial value which corresponds to the distance to the obstacle as indicated in an initial table; and

cause the apparatus to move along the avoidance route indicated by the parameter calculated if the action of avoiding the obstacle is selected, and to cause the apparatus to move along the route to the destination if the action of causing the apparatus to proceed straight is selected.