US 11,948,310 B2
Systems and methods for jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator
Vitor Guizilini, Santa Clara, CA (US); Rares A. Ambrus, San Francisco, CA (US); Kuan-Hui Lee, San Jose, CA (US); and Adrien David Gaidon, Mountain View, CA (US)
Assigned to Toyota Research Institute, Inc., Los Altos, CA (US)
Filed by Toyota Research Institute, Inc., Los Altos, CA (US)
Filed on Sep. 29, 2021, as Appl. No. 17/489,237.
Claims priority of provisional application 63/195,796, filed on Jun. 2, 2021.
Prior Publication US 2022/0392083 A1, Dec. 8, 2022
Int. Cl. G06K 9/00 (2022.01); G05D 1/00 (2006.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06T 7/246 (2017.01); G06T 7/50 (2017.01); G06T 7/55 (2017.01); G06T 7/73 (2017.01)
CPC G06T 7/248 (2017.01) [G05D 1/0221 (2013.01); G05D 1/0246 (2013.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06T 7/50 (2017.01); G06T 7/55 (2017.01); G06T 7/73 (2017.01); G06T 2207/10024 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator, the system comprising:
one or more processors; and
a memory communicably coupled to the one or more processors and storing:
an optical flow estimation module including instructions that when executed by the one or more processors cause the one or more processors to process a pair of temporally adjacent monocular image frames using a first neural network structure to produce a first optical flow estimate;
a depth and scene flow estimation module including instructions that when executed by the one or more processors cause the one or more processors to:
process the pair of temporally adjacent monocular image frames using a second neural network structure to produce an estimated depth map and an estimated scene flow; and
process the estimated depth map and the estimated scene flow using the second neural network structure to produce a second optical flow estimate; and
a training module including instructions that when executed by the one or more processors cause the one or more processors to impose a consistency loss between the first optical flow estimate and the second optical flow estimate that minimizes a difference between the first optical flow estimate and the second optical flow estimate to improve performance of the first neural network structure in estimating optical flow and the second neural network structure in estimating depth and scene flow.