US 12,229,218 B2
	Learning device, learning method, and learning program
Ryota Higa, Tokyo (JP)
Assigned to NEC CORPORATION, Tokyo (JP)
Appl. No. 17/296,796
Filed by NEC Corporation, Tokyo (JP)
PCT Filed Dec. 7, 2018, PCT No. PCT/JP2018/045112 § 371(c)(1), (2) Date May 25, 2021, PCT Pub. No. WO2020/115903, PCT Pub. Date Jun. 11, 2020.
Prior Publication US 2022/0012540 A1, Jan. 13, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 17/11 (2006.01); G06F 18/21 (2023.01)

CPC G06F 18/217 (2023.01) [G06F 17/11 (2013.01); G06N 20/00 (2019.01)]

11 Claims

8. A method performed by a computer and comprising:

receiving input of a type of a reward function, the reward function specifying a reward for performance of an action by an autonomous vehicle for a state of an environment of the autonomous vehicle, wherein

the state includes one or more of a map of and/or road conditions in surrounding of the autonomous vehicle, and positions and/or speeds of other vehicles in the surroundings, and

the action includes changing a path of the autonomous vehicle and a speed of the autonomous vehicle;

specifying a model by which the reward function is to be learned, based on the type of the reward function, wherein the model is a deep neural network;

learning the reward function according to the specified model, thereby learning a policy for selecting the action to be performed by the autonomous vehicle based on the state of the environment of the autonomous vehicle;

receiving a current state of the environment of the autonomous vehicle;

determining the action to be performed by the autonomous vehicle by applying the learned policy to the current state of the environment of the autonomous vehicle; and

controlling the autonomous vehicle to cause the autonomous vehicle to perform the determined action, wherein

when the current state indicates that there is an obstacle in front of the autonomous vehicle, the determined action is to change the path of the autonomous vehicle to avoid the obstacle, such that the learned policy provides a practical improvement in autonomous vehicle technology in that the learned policy provides for obstacle avoidance.