CPC G05D 1/0251 (2013.01) [G06T 7/11 (2017.01); G06T 7/50 (2017.01); G06T 9/00 (2013.01); G06V 10/82 (2022.01); G06V 20/56 (2022.01); G06V 30/274 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/30252 (2013.01)] | 17 Claims |
1. A method for controlling an ego agent comprising:
capturing a first two-dimensional (2D) image associated with a current view of an environment corresponding to a current location of the ego agent;
generating, via a semantic encoder, a semantically segmented image of the first 2D image;
generating, via a depth estimation network, a depth map of the current view of the environment based on the semantically segmented image, the depth estimation network being trained via a disposable two-stream network including a red-green-blue (RGB) encoder, a semantic encoder, and a depth decoder shared between the RGB encoder and the semantic encoder, the depth estimation network being limited to the semantic encoder and the depth decoder during inference;
generating a three-dimensional (3D) reconstruction of the first 2D image based on the depth map and a current pose of the ego agent, the current pose being based on the first 2D image and the second 2D image, the second 2D image associated with a view of the environment that is different from the current view; and
controlling an action of the ego agent based on the 3D estimate of the environment.
|