US 12,420,407 B2
	Information processing device, information processing method, and computer program product
Toshimitsu Kaneko, Kawasaki (JP); Kenichi Shimoyama, Ota (JP); and Tatsuya Tanaka, Kawasaki (JP)
Assigned to KABUSHIKI KAISHA TOSHIBA, Tokyo (JP)
Filed by KABUSHIKI KAISHA TOSHIBA, Tokyo (JP)
Filed on Aug. 30, 2021, as Appl. No. 17/446,310.
Claims priority of application No. 2020-185041 (JP), filed on Nov. 5, 2020.
Prior Publication US 2022/0134545 A1, May 5, 2022
Int. Cl. B25J 9/16 (2006.01); G05B 13/02 (2006.01)

CPC B25J 9/163 (2013.01) [G05B 13/0265 (2013.01)]

14 Claims

1. An information processing device comprising:

processing circuitry configured to:

acquire one or more pieces of first state information and one or more pieces of second state information, the first state information representing a state of each of one or more subjects at time t1, the one or more subjects each including a first region as a starting location of an object that a robot moves and a second region as a moving destination of the object, the second state information representing a state of each of the one or more subjects at time t2 after the time t1, the first state information and the second state information being obtained from one action of the robot from the time t1 to the time t2;

generate learning data for use in reinforcement learning of a machine learning model for use in inference of an action of the robot; and

perform the reinforcement learning of the machine learning model using the learning data,

the learning data including first learning data and second learning data,

the first learning data including the one or more pieces of first state information and the one or more pieces of second state information,

the second learning data including:

the one or more pieces of first state information at least part of which is replaced with any of the one or more pieces of the second state information, and

the one or more pieces of second state information at least part of which is replaced with any of the one or more pieces of first state information,

wherein the processing circuitry is configured to control actions of the robot based on an output of the machine learning model.