US 12,243,145 B2
Re-timing objects in video via layered neural rendering
Forrester H. Cole, Cambridge, MA (US); Erika Lu, Lexington, MA (US); Tali Dekel, Arlington, MA (US); William T. Freeman, Acton, MA (US); David Henry Salesin, Saualito, CA (US); and Michael Rubinstein, Natick, MA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 17/927,101
Filed by Google LLC, Mountain View, CA (US)
PCT Filed May 22, 2020, PCT No. PCT/US2020/034296
§ 371(c)(1), (2) Date Nov. 22, 2022,
PCT Pub. No. WO2021/236104, PCT Pub. Date Nov. 25, 2021.
Prior Publication US 2023/0206955 A1, Jun. 29, 2023
Int. Cl. G06T 13/80 (2011.01); G06V 10/44 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G11B 27/00 (2006.01); G11B 27/031 (2006.01)
CPC G06T 13/80 (2013.01) [G06V 10/454 (2022.01); G06V 10/82 (2022.01); G06V 20/46 (2022.01); G06V 20/49 (2022.01); G11B 27/005 (2013.01); G11B 27/031 (2013.01)] 22 Claims
OG exemplary drawing
 
1. A computer-implemented method for decomposing videos into multiple layers that can be individually retimed and re-combined with modified relative timings, the computer-implemented method comprising:
obtaining, by a computing system comprising one or more computing devices, video data, the video data comprising a plurality of image frames depicting one or more objects; and
for each of the plurality of image frames:
generating, by the computing system, one or more object maps, wherein each of the one or more object maps is descriptive of a respective location of at least one object of the one or more objects within the image frame;
inputting, by the computing system, the image frame and the one or more object maps into a machine-learned layer renderer model, comprising iteratively individually inputting each of the one or more object maps into the machine-learned layer renderer model;
receiving, by the computing system as output from the machine-learned layer renderer model, a background layer illustrative of a background of the video data and one or more object layers respectively associated with one of the one or more object maps, wherein each of the one or more object layers comprises image data illustrative of the at least one object and one or more trace effects at least partially attributable to the at least one object; and
generating, by the computing system, a retimed video by:
retiming at least one of the background layer or the one or more object layers; and
re-combining the one or more retimed lavers.