US 12,260,576 B2
	Unsupervised depth prediction neural networks
Vincent Michael Casser, Cambridge, MA (US); Soeren Pirk, Palo Alto, CA (US); Reza Mahjourian, Austin, TX (US); and Anelia Angelova, Sunnyvale, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Sep. 13, 2023, as Appl. No. 18/367,888.
Application 18/367,888 is a continuation of application No. 17/272,419, granted, now 11,783,500, previously published as PCT/US2019/049643, filed on Sep. 5, 2019.
Claims priority of provisional application 62/727,502, filed on Sep. 5, 2018.
Prior Publication US 2023/0419521 A1, Dec. 28, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 7/00 (2017.01); G06N 3/045 (2023.01); G06N 3/088 (2023.01); G06T 3/18 (2024.01); G06T 7/246 (2017.01); G06T 7/55 (2017.01)

CPC G06T 7/55 (2017.01) [G06N 3/045 (2023.01); G06N 3/088 (2013.01); G06T 3/18 (2024.01); G06T 7/248 (2017.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)]

19 Claims

1. A system comprising one or more computers in one or more locations and one or more storage devices storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations for online refinement of a depth prediction neural network during inference of the depth prediction neural network, the operations comprising:

performing one or more iterations of a set of inference operations, the set of inference operations comprising:

receiving a sequence of unseen input images that depict a same scene of a new environment, the sequence of input images comprising a first input image, a second input image, and a third input image, the input images being captured by a camera at different time steps, each of the input images including one or more potential objects;

generating, for each of the input images, a respective background image that includes portions of the input image that do not depict any of the potential objects in the input image;

processing the background images using a camera motion estimation neural network to generate a camera motion output that characterizes the motion of the camera between the input images in the sequence;

generating a sequence of warped images by applying a warping operation to the sequence of input images using the camera motion output, wherein each warped image is generated by applying the warping operation to two different images in the sequence;

for each of the one or more potential objects: generating, using an object motion estimation neural network, a respective object motion output for the potential object based on the sequence of warped images, the respective object motion output characterizing movements of the potential object between its positions as appeared in the input images; and

processing a particular input image of the sequence of input images using a depth prediction neural network and in accordance with current values of parameters of the depth prediction neural network to generate a depth output for the particular input image; and

after each iteration of the set of inference operations, performing online refinement of the depth prediction neural network by updating the current values of the parameters of the depth prediction neural network based on (i) the particular depth output for the particular input image, (ii) the camera motion output, and (iii) the object motion outputs for the one or more potential objects.