CPC G06V 20/58 (2022.01) [B60W 60/001 (2020.02); G06N 3/08 (2013.01); B60W 2420/52 (2013.01); B60W 2554/4049 (2020.02)] | 26 Claims |
1. A method of training a neural network to detect one or more objects in an environment, the method comprising:
obtaining a network input representing the environment, wherein the input comprises sensor data for each of a plurality of locations in the environment;
processing the network input using a first subnetwork of the neural network to generate a respective embedding for each of the plurality of locations in the environment;
processing the embeddings for each of the plurality of locations in the environment using a second subnetwork of the neural network to generate, for each of the plurality of locations in the environment, an object prediction that characterizes a possible object at the location in the environment;
processing the embeddings for each of the plurality of locations in the environment using a third subnetwork of the neural network to generate an updated embedding for each of the plurality of locations in the environment;
determining, for each of a plurality of pairs of the plurality of locations in the environment, whether the respective object predictions of the pair of locations characterize the same possible object or different possible objects;
computing a respective contrastive loss value for each of the plurality of pairs of locations in the environment, wherein:
for each pair of locations whose object predictions characterize the same possible object, the corresponding contrastive loss value is proportional to a difference between the respective updated embeddings of the pair of locations; and
for each pair of locations whose object predictions characterize different possible objects, the corresponding contrastive loss value is inversely proportional to a difference between the respective updated embeddings of the pair of locations; and
updating values for (i) a plurality of parameters of the first subnetwork and (ii) a plurality of parameters of the third subnetwork using the computed contrastive loss values.
|