| CPC G06T 7/11 (2017.01) [G06T 7/50 (2017.01); G06V 10/25 (2022.01); G06V 10/32 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 20/20 (2022.01); G06T 2207/20084 (2013.01); G06T 2207/30196 (2013.01)] | 20 Claims |

|
1. A method for obtaining scene segmentation, the method comprising:
obtaining, from an image sensor, image data of a real-world scene;
obtaining, from a depth sensor, sparse depth data of the real-world scene;
passing the image data to a first neural network to obtain one or more object regions of interest (ROIs) and one or more feature map ROIs, wherein each object ROI comprises at least one detected object;
passing the image data and the sparse depth data to a second neural network to obtain one or more dense depth map ROIs;
aligning the one or more object ROIs, one or more feature map ROIs, and one or more dense depth map ROIs; and
passing the aligned one or more object ROIs, one or more feature map ROIs, and one or more dense depth map ROIs to a fully convolutional network to obtain a segmentation of the real-world scene, wherein the segmentation contains one or more pixelwise predictions of one or more objects in the real-world scene;
wherein aligning the one or more object ROIs, one or more feature map ROIs, and one or more dense depth map ROIs comprises resizing, using an image-guided filter, at least some of the one or more object ROIs, one or more feature map ROIs, and one or more dense depth map ROIs to a common size.
|