CPC G06V 20/58 (2022.01) [G06T 7/20 (2013.01); G06V 10/62 (2022.01); G06V 10/82 (2022.01); B60W 60/001 (2020.02); B60W 2420/403 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/20084 (2013.01)] | 20 Claims |
1. A method comprising:
obtaining, by one or more sensors, a plurality of images of an environment, wherein each image of the plurality of images is associated with a corresponding time of a plurality of times;
generating, by one or more processing devices, a plurality of sets of feature tensors (FTs), wherein each set of FTs is associated with one or more objects of the environment depicted in a respective image of the plurality of images;
obtaining, using the plurality of sets of FTs, a combined FT;
processing the combined FT using an encoder neural network (NN) to generate a plurality of object vectors, each object vector of the plurality of object vectors characterizing association of an individual FT of the plurality of sets of FTs with other FTs of the plurality of sets of FTs; and
processing, using a decoder NN, the plurality of object vectors to identify one or more tracks, wherein each track of the one or more tracks characterizes motion of a respective object of the one or more objects of the environment.
|