US 11,989,020 B1
Training machine learning model(s), in simulation, for use in controlling autonomous vehicle(s)
James Andrew Bagnell, Pittsburgh, PA (US); Arun Venkatraman, Mountain View, CA (US); and Sanjiban Choudhury, Pittsburgh, PA (US)
Assigned to AURORA OPERATIONS, INC., Mountain View, CA (US)
Filed by Aurora Innovation, Inc., Palo Alto, CA (US)
Filed on Dec. 17, 2020, as Appl. No. 17/125,231.
Claims priority of provisional application 63/051,730, filed on Jul. 14, 2020.
Int. Cl. G05D 1/00 (2006.01); G05B 13/02 (2006.01); G06N 3/084 (2023.01)
CPC G05D 1/0088 (2013.01) [G05B 13/027 (2013.01); G05D 1/0221 (2013.01); G06N 3/084 (2013.01); G05D 2201/0213 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for training a machine learning model for use in controlling autonomous vehicles, the method comprising:
obtaining an initial state instance from a past episode of manual driving of a real vehicle, the initial state instance comprising state data for a time instance of the past episode, the state data comprising sensor data from one or more sensors of the real vehicle or additional data derived from the sensor data corresponding to the time instance of the past episode;
obtaining, for the initial state instance and for each of a plurality of subsequent time instances of the past episode, a corresponding ground truth label that is based on manual driving data from the past episode or a subsequent annotation of the past episode;
performing a simulated episode of driving a simulated autonomous vehicle, wherein performing the simulated episode comprises:
initializing the simulated episode based on the initial state instance, wherein initializing the simulated episode based on the initial state instance comprises:
initializing the simulated autonomous vehicle in a simulated environment based on the sensor data from the one or more sensors of the real vehicle or the additional data derived from the sensor data at the time instance of the past episode;
progressing the simulated autonomous vehicle in the simulated environment during the simulated episode based on one or more predictions made by processing one or more instances of simulated state data using the machine learning model, wherein progressing the simulated autonomous vehicle in the simulated environment based on the one or more predictions causes the simulated autonomous vehicle to progress in a different manner than did the real vehicle in the past episode of manual driving; and
updating the machine learning model based on one or more losses that are based on a comparison of the one or more predictions and the corresponding ground truth labels.