US 11,875,521 B2
Self-occlusion masks to improve self-supervised monocular depth estimation in multi-camera settings
Vitor Guizilini, Santa Clara, CA (US); Rares Andrei Ambrus, Santa Clara, CA (US); Adrien David Gaidon, San Francisco, CA (US); Igor Vasiljevic, Chicago, IL (US); and Gregory Shakhnarovich, Chicago, IL (US)
Assigned to TOYOTA RESEARCH INSTITUTE, INC., Los Altos, CA (US)
Filed by TOYOTA RESEARCH INSTITUTE, INC., Los Altos, CA (US)
Filed on Jul. 26, 2021, as Appl. No. 17/385,358.
Claims priority of provisional application 63/161,614, filed on Mar. 16, 2021.
Prior Publication US 2022/0301212 A1, Sep. 22, 2022
Int. Cl. G06T 7/55 (2017.01); B60R 1/00 (2022.01); G06T 3/00 (2006.01); G05D 1/02 (2020.01); G06N 3/08 (2023.01); G06T 7/579 (2017.01); G06T 7/292 (2017.01); G06T 7/11 (2017.01); B60W 60/00 (2020.01); G06T 3/40 (2006.01); G06F 18/214 (2023.01); H04N 23/90 (2023.01)
CPC G06T 7/55 (2017.01) [B60R 1/00 (2013.01); B60W 60/001 (2020.02); G05D 1/0212 (2013.01); G05D 1/0246 (2013.01); G06F 18/214 (2023.01); G06F 18/2148 (2023.01); G06N 3/08 (2013.01); G06T 3/0012 (2013.01); G06T 3/0093 (2013.01); G06T 3/40 (2013.01); G06T 7/11 (2017.01); G06T 7/292 (2017.01); G06T 7/579 (2017.01); H04N 23/90 (2023.01); B60R 2300/102 (2013.01); B60W 2420/42 (2013.01); G05D 2201/0213 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30244 (2013.01); G06T 2207/30252 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method for self-supervised depth and ego-motion estimation, the method comprising:
determining a multi-camera photometric loss associated with a multi-camera rig of an ego vehicle;
generating a self-occlusion mask by manually segmenting self-occluded areas of images captured by the multi-camera rig of the ego vehicle;
multiplying the multi-camera photometric loss with the self-occlusion mask to form a self-occlusion masked photometric loss;
training a depth estimation model and an ego-motion estimation model according to the self-occlusion masked photometric loss by leveraging cross-camera temporal contexts via spatio-temporal photometric constraints to increase an amount of overlap between cameras of the multi-camera rig using a predicted ego-motion of the ego vehicle; and
predicting a 360° point cloud of a scene surrounding the ego vehicle according to the depth estimation model and the ego-motion estimation model.