CPC B60W 60/00276 (2020.02) [B60W 60/0011 (2020.02); G06N 20/00 (2019.01); G06V 20/58 (2022.01); B60W 2720/103 (2013.01)] | 20 Claims |
1. A method comprising:
receiving log data including at least one of sensor data associated with a first time or perception data associated with the first time and determined based at least in part on the sensor data, and an action taken by a target vehicle at a second time after the first time;
in a first training stage, receiving, from a machine-learned (ML) component, a predicted action associated with the target vehicle, wherein the ML component determines the predicted action based at least in part on at least one of the sensor data or the perception data;
in the first training stage, modifying the ML component to reduce a difference between the predicted action and the action taken by the target vehicle;
in a second training stage subsequent to the first training stage, controlling a vehicle based at least in part on output of the ML component determined based at least in part on data associated with the vehicle;
in the second training stage, determining a reward associated with a performance of the vehicle while the vehicle was controlled based at least in part on the output of the ML component;
in the second training stage, modifying the ML component based at least in part on the reward; and
transmitting the ML component to another vehicle, wherein operation of the other vehicle is based at least in part on executing the ML component.
|