US 12,265,893 B2
	Learning device, learning method, and learning program
Ryota Higa, Tokyo (JP)
Assigned to NEC CORPORATION, Tokyo (JP)
Appl. No. 17/296,798
Filed by NEC Corporation, Tokyo (JP)
PCT Filed Dec. 7, 2018, PCT No. PCT/JP2018/045113 § 371(c)(1), (2) Date May 25, 2021, PCT Pub. No. WO2020/115904, PCT Pub. Date Jun. 11, 2020.
Prior Publication US 2021/0398019 A1, Dec. 23, 2021
Int. Cl. G06N 20/00 (2019.01); G06N 5/01 (2023.01); G06N 7/01 (2023.01); G06N 10/60 (2022.01)

CPC G06N 20/00 (2019.01) [G06N 10/60 (2022.01); G06N 5/01 (2023.01); G06N 7/01 (2023.01)]

8 Claims

1. A learning device comprising:

a memory storing software code; and

a hardware processor configured to execute the software code to:

receive input of a functional form of a reward used in a reward function specifying a reward for performance of an action by an autonomous vehicle for a state of an environment of the autonomous vehicle, wherein

the state includes one or more of a map of and/or road conditions in surrounding of the autonomous vehicle, and positions and/or speeds of other vehicles in the surroundings, and

the action includes changing a path of the autonomous vehicle and a speed of the autonomous vehicle;

specify a model by which the reward function is to be learned, based on the functional form of the reward, wherein the functional form of the reward is input as a binary neural network or a Hubbard model;

learn the reward function according to the specified model, thereby learning a policy for selecting the action to be performed by the autonomous vehicle based on the state of the environment of the autonomous vehicle;

receive a state of the environment of the autonomous vehicle;

determine the action to be performed by the autonomous vehicle by applying the learned policy to the state of the environment of the autonomous vehicle; and

control the autonomous vehicle to cause the autonomous vehicle to perform the determined action, wherein

when the state indicates that there is an obstacle in front of the autonomous vehicle, the determined action is to change the path of the autonomous vehicle to avoid the obstacle, such that the learned policy provides a practical improvement in autonomous vehicle technology in that the learned policy provides for obstacle avoidance.