US 11,734,847 B2
Image depth prediction neural networks
Anelia Angelova, Sunnyvale, CA (US); Martin Wicke, San Francisco, CA (US); and Reza Mahjourian, Austin, TX (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Jan. 15, 2021, as Appl. No. 17/150,291.
Application 17/150,291 is a continuation of application No. 16/332,991, granted, now 10,929,996, previously published as PCT/US2017/051070, filed on Sep. 12, 2017.
Claims priority of provisional application 62/395,326, filed on Sep. 15, 2016.
Prior Publication US 2021/0233265 A1, Jul. 29, 2021
Int. Cl. G06T 7/55 (2017.01); G06T 3/40 (2006.01); G06N 3/08 (2023.01); G06T 15/20 (2011.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06T 7/579 (2017.01)
CPC G06T 7/55 (2017.01) [G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06T 3/40 (2013.01); G06T 15/205 (2013.01); G06T 7/579 (2017.01); G06T 2207/10016 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30244 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
an image depth prediction neural network implemented by one or more computers, wherein the image depth prediction neural network is a recurrent neural network that is configured to receive a sequence of images and, for each image in the sequence:
process the image, which is a current image at a first time step in the sequence, in accordance with a current internal state of the recurrent neural network to (i) update the current internal state and (ii) generate a current depth map that characterizes a current depth of the image in the sequence; and
an image generation subsystem configured to, for each image in the sequence:
receive the current depth map that characterizes the current depth of the image,
construct, based on the current depth map and the image, a plurality of three-dimensional (3D) points, each of the plurality of 3D points corresponding to a different pixel in the image, and
generate a depth output that characterizes a predicted depth of a future image in the sequence by applying one or more transformation layers to the plurality of 3D points, wherein the depth output comprises a set of values defining the topology of a scene represented by the future image in a third, depth dimension.