US 12,454,276 B2
	System and method of adaptively tuning parameters in action planning for automated driving
Stefano Bonasera, Royal Oak, MI (US); Daniel Aguilar Marsillach, Detroit, MI (US); and Sayyed Rouhollah Jafari Tafti, Troy, MI (US)
Assigned to GM GLOBAL TECHNOLOGY OPERATIONS LLC, Detroit, MI (US)
Filed by GM Global Technology Operations LLC, Detroit, MI (US)
Filed on Jan. 11, 2023, as Appl. No. 18/152,842.
Prior Publication US 2024/0227829 A1, Jul. 11, 2024
Int. Cl. B60W 50/06 (2006.01); B60W 60/00 (2020.01)

CPC B60W 50/06 (2013.01) [B60W 60/0013 (2020.02); B60W 60/0015 (2020.02); B60W 2520/10 (2013.01); B60W 2520/105 (2013.01); B60W 2552/05 (2020.02); B60W 2554/802 (2020.02); B60W 2555/20 (2020.02); B60W 2555/60 (2020.02)]

20 Claims

1. A method of adaptively tuning parameters in action planning for automated driving of a vehicle to a destination, the method comprising:

receiving sensor data from a sensor of the vehicle, lane data of a road plan to the destination, and a plurality of first hyperparameters in a reinforcement learning agent at an initial state representing a current situation of the reinforcement learning agent and an initial timestamp, the reinforcement learning agent having a planning policy to adaptively optimize the plurality of first hyperparameters;

adjusting the plurality of first hyperparameters based on the sensor data and the lane data via the planning policy having at least one first activation function to define an output, the at least one first activation function including:

where input includes the sensor data, the lane data, and each first hyperparameter; W is a weight to scale each first hyperparameter defining a plurality of scaled hyperparameters; b is a bias to adjust each scaled hyperparameter; and o is the output of the first activation function defining a plurality of second hyperparameters;

determining a baseline trajectory action based on the plurality of second hyperparameters, the sensor data, the lane data, and a trajectory reward value at a final state representing a subsequent state of the reinforcement learning agent and a final timestamp, the trajectory reward value including:

where r_searchis a default reward value based on one of a safety function, a comfort function, and a compliance function with predetermined road rules, α₁is a first calibration parameter based on a first driving mode, and r_tis the trajectory reward value;

modifying the baseline trajectory action based on the plurality of second hyperparameters, the sensor data, and the lane data processed by the reinforcement learning agent between the initial state and the final state to define a refined trajectory action; and

controlling the vehicle based on the refined trajectory action with respect to the sensor data and lane data to automatedly drive the vehicle to the destination.