US 11,783,500 B2
Unsupervised depth prediction neural networks
Vincent Michael Casser, Cambridge, MA (US); Soeren Pirk, Palo Alto, CA (US); Reza Mahjourian, Austin, TX (US); and Anelia Angelova, Sunnyvale, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Appl. No. 17/272,419
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Sep. 5, 2019, PCT No. PCT/US2019/049643
§ 371(c)(1), (2) Date Mar. 1, 2021,
PCT Pub. No. WO2020/051270, PCT Pub. Date Mar. 12, 2020.
Claims priority of provisional application 62/727,502, filed on Sep. 5, 2018.
Prior Publication US 2021/0319578 A1, Oct. 14, 2021
Int. Cl. G06T 7/00 (2017.01); G06T 7/55 (2017.01); G06T 7/246 (2017.01); G06N 3/088 (2023.01); G06T 3/00 (2006.01); G06N 3/045 (2023.01)
CPC G06T 7/55 (2017.01) [G06N 3/045 (2023.01); G06N 3/088 (2013.01); G06T 3/0093 (2013.01); G06T 7/248 (2017.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising one or more computers in one or more locations and one or more storage devices storing instructions that, when executed by one or more computers, cause the one or more computers to:
receive a sequence of input images that depict the same scene, the input images being captured by a camera at different time steps, each of the input images including one or more potential objects;
generate, for each of the input images, a respective background image that includes portions of the input image that do not depict any of the potential objects in the input image, comprising:
for each input image, generating a respective object segmentation mask for each of the potential objects in the input image,
for each input image, generating a background segmentation mask based on the object segmentation masks generated for the potential objects in the input image, and
for each input image, generating the respective background image for the input image based on a combination of the background segmentation masks generated for the input images in the sequence and the input image;
process the background images using a camera motion estimation neural network to generate a camera motion output that characterizes the motion of the camera between the input images in the sequence;
for each of the one or more potential objects: generate, using an object motion estimation neural network, a respective object motion output for the potential object based on the sequence of input images, the respective object motion output characterizing movements of the potential object between its positions as appeared in the input images;
process a particular input image of the sequence of input images using a depth prediction neural network and in accordance with current values of parameters of the depth prediction neural network to generate a depth output for the particular input image; and
update the current values of the parameters of the depth prediction neural network based on (i) the particular depth output for the particular input image, (ii) the camera motion output, and (iii) the object motion outputs for the one or more potential objects.