US 12,275,388 B2
Adaptation system and adaptation method
Akihiro Katayama, Toyota (JP); Shiro Yano, Tokyo (JP); and Kenichiro Kumada, Nagakute (JP)
Assigned to TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota (JP)
Filed by TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota (JP)
Filed on Jan. 22, 2024, as Appl. No. 18/419,369.
Claims priority of application No. 2023-024534 (JP), filed on Feb. 20, 2023.
Prior Publication US 2024/0300470 A1, Sep. 12, 2024
Int. Cl. B60W 10/06 (2006.01); B60W 50/00 (2006.01)
CPC B60W 10/06 (2013.01) [B60W 50/00 (2013.01); B60W 2050/0088 (2013.01); B60W 2420/54 (2013.01); B60W 2510/0685 (2013.01); B60W 2520/105 (2013.01)] 8 Claims
OG exemplary drawing
 
1. An adaptation system for optimizing a function used to control a motor, the system comprising:
processing circuitry; and
a storage device, wherein
the processing circuitry is configured to repeat a learning routine until a number of executions of the learning routine reaches a number of terminations, thereby optimizing the function to be stored in a control device for controlling the motor, the learning routine including:
a trial that drives the motor while acquiring a state variable by a sensor in a state in which a change has been added to the function for outputting a command value to the motor;
an evaluation that calculates a reward based on the acquired state variable; and
learning that updates the function based on the reward, and
the processing circuitry is configured to execute
a first process that, until a specified condition for determining that optimization has progressed to a final stage is met, performs a first trial and a second trial in which the change is added to the function so as to adjust, in a sign reversing direction, the command value output from the function in each execution of the learning routine, updates the function by reflecting, in the function, the change in one of the first trial and the second trial in which the reward is larger, and ends the learning routine, and
a second process that, until the number of executions of the learning routine reaches the number of terminations after the specified condition is met, executes the first trial and the second trial multiple times in each learning routine, compares the reward for the multiple executions of the first trial with the reward for the multiple executions of the second trial, updates the function by reflecting, in the function, the change in one of the first trial and the second trial in which the reward is larger, and ends the learning routine, wherein
the processing circuitry is configured to, in the second trial:
perform each of the first trial and the second trial three times;
compare the reward between the first trials, the reward between the second trials, and the reward between the third trials; and
reflect, in the function, a change in one of the first trial and the second trial in which the number of times that the reward is determined to be large is large.