CPC G06T 7/55 (2017.01) [G01B 11/22 (2013.01); G06T 3/18 (2024.01); G06T 7/73 (2017.01); G06T 11/00 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] | 14 Claims |
1. A computer-implemented method comprising:
receiving a time series of images of a scene including a primary image and an additional image from an earlier time than the primary image, wherein the time series of images are monocular images derived from monocular video;
inputting the time series of images into a depth estimation model;
receiving, as output from the depth estimation model, a depth map of the primary image, the depth map generated based on a cost volume concatenating differences between a primary feature map of the primary image and a plurality of warped feature maps of the additional image for each of a plurality of depth planes, wherein receiving the depth map as output from the depth estimation model comprises;
generating a primary feature map for the primary image and an additional feature map for the additional image;
generating a warped feature map comprising a plurality of warped feature map layers, each warped feature map layer generated by warping the additional feature map to a plurality of depth planes based on (1) a depth plane of the plurality to which the feature map is being warped, (2) a relative pose between the primary image and the additional image, and (3) intrinsics of a camera used to capture the primary image and the additional image;
for each warped feature map layer, calculating a difference between the warped feature map layer and the primary feature map; and
building the cost volume by concatenating the differences between layers of the warped feature map and the primary feature map;
wherein the output is based on the cost volume and the primary feature map;
generating virtual content using the depth map; and
displaying an image of scene augmented with the virtual content.
|