US 11,966,234 B2
System and method for monocular depth estimation from semantic information
Vitor Guizilini, Santa Clara, CA (US); Jie Li, Los Altos, CA (US); Rares A. Ambrus, San Francisco, CA (US); Sudeep Pillai, Santa Clara, CA (US); and Adrien Gaidon, San Jose, CA (US)
Assigned to TOYOTA RESEARCH INSTITUTE, INC., Los Altos, CA (US)
Filed by TOYOTA RESEARCH INSTITUTE, INC., Los Altos, CA (US)
Filed on Jul. 23, 2020, as Appl. No. 16/937,470.
Prior Publication US 2022/0026918 A1, Jan. 27, 2022
Int. Cl. G06T 17/00 (2006.01); G05D 1/00 (2006.01); G06T 7/11 (2017.01); G06T 7/50 (2017.01); G06T 9/00 (2006.01); G06V 10/82 (2022.01); G06V 20/56 (2022.01); G06V 30/262 (2022.01)
CPC G05D 1/0251 (2013.01) [G06T 7/11 (2017.01); G06T 7/50 (2017.01); G06T 9/00 (2013.01); G06V 10/82 (2022.01); G06V 20/56 (2022.01); G06V 30/274 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/30252 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method for controlling an ego agent comprising:
capturing a first two-dimensional (2D) image associated with a current view of an environment corresponding to a current location of the ego agent;
generating, via a semantic encoder, a semantically segmented image of the first 2D image;
generating, via a depth estimation network, a depth map of the current view of the environment based on the semantically segmented image, the depth estimation network being trained via a disposable two-stream network including a red-green-blue (RGB) encoder, a semantic encoder, and a depth decoder shared between the RGB encoder and the semantic encoder, the depth estimation network being limited to the semantic encoder and the depth decoder during inference;
generating a three-dimensional (3D) reconstruction of the first 2D image based on the depth map and a current pose of the ego agent, the current pose being based on the first 2D image and the second 2D image, the second 2D image associated with a view of the environment that is different from the current view; and
controlling an action of the ego agent based on the 3D estimate of the environment.