US 11,745,746 B2
	Method for generating vehicle controlling data, vehicle controller, vehicle control system, and learning device for vehicle
Yosuke Hashimoto, Nagakute (JP); Akihiro Katayama, Toyota (JP); Yuta Oshiro, Nagoya (JP); Kazuki Sugie, Toyota (JP); and Naoya Oka, Nagakute (JP)
Assigned to TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota (JP)
Filed by TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota (JP)
Filed on Dec. 29, 2020, as Appl. No. 17/136,220.
Claims priority of application No. 2020-002031 (JP), filed on Jan. 9, 2020.
Prior Publication US 2021/0213963 A1, Jul. 15, 2021
Int. Cl. B60W 50/00 (2006.01); G07C 5/02 (2006.01); G06N 20/00 (2019.01); F02D 41/24 (2006.01)

CPC B60W 50/00 (2013.01) [G06N 20/00 (2019.01); G07C 5/02 (2013.01); B60W 2050/0083 (2013.01); B60W 2520/10 (2013.01); F02D 41/2438 (2013.01)]

6 Claims

1. A method for generating vehicle controlling data, the method, comprising: when relationship specifying data that specifies a relationship between a state of a vehicle and an action variable, which is a variable related to operation of an electronic device mounted on the vehicle, is stored in a storage device,

executing an operating process that operates the electronic device with processing circuitry;

executing an obtaining process that obtains an area variable, which is a variable indicating an area in which the vehicle is located, a state of the vehicle based on a detection value of a sensor with the processing circuitry, and includes a value distinguishing between areas divided based on an average vehicle speed;

executing a reward calculation process that assigns a reward based on the state of the vehicle obtained by the obtaining process with the processing circuitry, the reward being larger when a property of the vehicle meets a predetermined criterion and relatively smaller when the property of the vehicle does not meet the predetermined criterion; and

executing an updating process that uses the state of the vehicle obtained by the obtaining process, a value of the action variable used for operation of the electronic device, and the reward corresponding to operation of the electronic device as inputs to a predetermined update mapping to update the relationship specifying data with the processing circuitry, wherein

the update mapping is configured to output the relationship specifying data that is updated to increase an expected return of the reward when the electronic device is operated in accordance with the relationship specifying data, and

the reward calculation process includes a changing process that changes a reward assigned when the area variable equals a second value indicating that the average vehicle speed is high and the property of the vehicle is a predetermined property from a reward assigned when the area variable equals a first value indicating that the average vehicle speed is low and the property of the vehicle is the predetermined property.