US 12,175,708 B2
Systems and methods for self-supervised learning of camera intrinsic parameters from a sequence of images
Vitor Guizilini, Santa Clara, CA (US); Adrien David Gaidon, Mountain View, CA (US); Rares A. Ambrus, San Francisco, CA (US); Igor Vasiljevic, Fort Lauderdale, FL (US); Jiading Fang, Chicago, IL (US); Gregory Shakhnarovich, Chicago, IL (US); and Matthew R. Walter, Chicago, IL (US)
Assigned to Toyota Research Institute, Inc., Los Altos, CA (US); and Toyota Technological Institute at Chicago, Chicago, IL (US)
Filed by Toyota Research Institute, Inc., Los Altos, CA (US)
Filed on Mar. 11, 2022, as Appl. No. 17/692,357.
Claims priority of provisional application 63/243,463, filed on Sep. 13, 2021.
Prior Publication US 2023/0080638 A1, Mar. 16, 2023
Int. Cl. G06T 3/18 (2024.01); G06T 5/80 (2024.01); G06T 7/50 (2017.01); G06T 7/80 (2017.01); B60W 60/00 (2020.01); B64C 39/02 (2023.01); G05D 1/00 (2006.01); H04N 17/00 (2006.01)
CPC G06T 7/80 (2017.01) [G06T 3/18 (2024.01); G06T 5/80 (2024.01); G06T 7/50 (2017.01); B60W 60/001 (2020.02); B60W 2420/403 (2013.01); B64C 39/024 (2013.01); B64U 2201/00 (2023.01); G05D 1/0246 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/10032 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30244 (2013.01); G06T 2207/30252 (2013.01); H04N 17/002 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for self-supervised learning of camera intrinsic parameters from a sequence of images, the system comprising:
a processor; and
a memory storing computer-readable instructions that, when executed by the processor, cause the processor to:
produce a depth map from a current image frame captured by a camera;
generate a point cloud from the depth map using a differentiable unprojection operation based on estimated camera intrinsic parameters of a parametric camera model;
process the current image frame and a context image frame captured by the camera to produce a camera pose estimate;
produce a warped point cloud based on the camera pose estimate;
generate a warped image frame from the warped point cloud using a differentiable projection operation based on the estimated camera intrinsic parameters;
compare the warped image frame with the context image frame to produce a self-supervised photometric loss;
update the estimated camera intrinsic parameters on a per-image-sequence basis using a gradient from the self-supervised photometric loss; and
generate, based on learned camera intrinsic parameters to which the estimated camera intrinsic parameters have converged according to predetermined convergence criteria, a rectified image frame that corrects distortion in an image frame captured by the camera, wherein the convergence criteria include at least updating until a change in the estimated camera intrinsic parameters from iteration to iteration falls below a predetermined threshold.