US 12,243,243 B2
Method and apparatus with scene flow estimation
Youngjun Kwak, Seoul (KR); Taekyung Kim, Daejeon (KR); Changick Kim, Daejeon (KR); Byeongjun Park, Daejeon (KR); and Changbeom Park, Seoul (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR); and Korea Advanced Institute of Science and Technology, Daejeon (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR); and Korea Advanced Institute of Science and Technology, Daejeon (KR)
Filed on Feb. 10, 2022, as Appl. No. 17/668,513.
Claims priority of application No. 10-2021-0034424 (KR), filed on Mar. 17, 2021; and application No. 10-2021-0058065 (KR), filed on May 4, 2021.
Prior Publication US 2022/0301190 A1, Sep. 22, 2022
Int. Cl. G06T 7/246 (2017.01); G06T 3/18 (2024.01); G06T 3/4007 (2024.01); G06T 7/55 (2017.01); G06V 20/40 (2022.01)
CPC G06T 7/248 (2017.01) [G06T 3/18 (2024.01); G06T 3/4007 (2013.01); G06T 7/55 (2017.01); G06V 20/46 (2022.01); G06T 2207/10016 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A processor-implemented scene flow estimation method, comprising:
receiving a first feature pyramid and a second feature pyramid by encoding a first frame and a second frame of an input image through a same encoder;
extracting a depth feature based on the received first feature pyramid;
extracting a motion feature based on the received first feature pyramid and the received second feature pyramid;
generating an overall feature based on the depth feature and the motion feature; and
estimating a scene flow based on the overall feature,
wherein the extracting of the motion feature comprises:
inputting a first level of the second feature pyramid to a warping layer;
inputting an output of the warping layer and a first level of the first feature pyramid to a correlation layer; and
concatenating an output of the correlation layer and the first level of the first feature pyramid and inputting a result of the concatenating to a correlation regularization module configured to perform a plurality of convolution operations,
wherein the warping layer is configured to adjust a position of each pixel of the first level of the second feature pyramid based on a result of a previous optical flow estimation performed based on the motion feature, and a result of a previous scene flow estimation performed based on the overall feature.