| CPC G07C 5/008 (2013.01) [G06N 20/00 (2019.01); G07C 5/02 (2013.01); B60W 60/001 (2020.02); B60W 2552/00 (2020.02); B60W 2554/4046 (2020.02); B60W 2555/00 (2020.02)] | 16 Claims |

|
1. A method comprising:
receiving first driving data associated with a first vehicle;
receiving second driving data associated with one or more vehicles around the first vehicle;
creating training data comprising the first driving data labeled as positive data and the second driving data as unlabeled data;
using the training data to train a classifier to predict whether driving data input to the classifier is positive or unlabeled;
receiving third driving data associated with a second vehicle;
inputting the third driving data to the classifier after the classifier has been trained;
determining a reward function based on an output of the classifier;
determining a driving policy based on the reward function; and
causing the second vehicle to drive autonomously based on the driving policy.
|