| CPC G06V 20/58 (2022.01) [G06T 7/55 (2017.01); G06V 10/44 (2022.01); G06V 10/82 (2022.01); G06T 3/4046 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30252 (2013.01); G06V 10/40 (2022.01); G06V 10/70 (2022.01); G06V 20/698 (2022.01); G06V 30/18086 (2022.01)] | 20 Claims |

|
1. A method comprising:
obtaining one or more perspective camera images of an environment;
generating, using a first neural network (NN), for each pixel of a set of pixels of the one or more perspective camera images,
a feature vector (FV), and
a depth distribution for a portion of the environment imaged by a corresponding pixel, wherein the first NN is trained using a plurality of training images and a depth ground truth data for the plurality of training images;
obtaining, for each pixel of the set of pixels, a feature tensor (FT) in view of (i) the FV for a respective pixel and (ii) the depth distribution for the respective pixel; and
processing the obtained FTs, using a second NN, to identify one or more objects in the environment.
|