US 12,354,415 B2
Method and system for learning reward functions for driving using positive-unlabeled reward learning
Blake Wulfe, San Francisco, CA (US); and Adrien Gaidon, San Jose, CA (US)
Assigned to Toyota Research Institute, Inc., Los Altos, CA (US)
Filed by Toyota Research Institute, Los Altos, CA (US)
Filed on Jan. 27, 2022, as Appl. No. 17/586,389.
Prior Publication US 2023/0237856 A1, Jul. 27, 2023
Int. Cl. G06N 20/00 (2019.01); G07C 5/00 (2006.01); G07C 5/02 (2006.01); B60W 60/00 (2020.01)
CPC G07C 5/008 (2013.01) [G06N 20/00 (2019.01); G07C 5/02 (2013.01); B60W 60/001 (2020.02); B60W 2552/00 (2020.02); B60W 2554/4046 (2020.02); B60W 2555/00 (2020.02)] 16 Claims
OG exemplary drawing
 
1. A method comprising:
receiving first driving data associated with a first vehicle;
receiving second driving data associated with one or more vehicles around the first vehicle;
creating training data comprising the first driving data labeled as positive data and the second driving data as unlabeled data;
using the training data to train a classifier to predict whether driving data input to the classifier is positive or unlabeled;
receiving third driving data associated with a second vehicle;
inputting the third driving data to the classifier after the classifier has been trained;
determining a reward function based on an output of the classifier;
determining a driving policy based on the reward function; and
causing the second vehicle to drive autonomously based on the driving policy.