CPC G06T 5/77 (2024.01) [G06T 5/50 (2013.01); G06T 7/11 (2017.01); G06T 7/20 (2013.01); G06T 11/00 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/20221 (2013.01); G06T 2207/30196 (2013.01); G06T 2207/30244 (2013.01)] | 20 Claims |
1. A method comprising:
receiving, by one or more processors, a video that includes a depiction of a real-world object in a real-world environment;
removing a depiction of the real-world object from a region of a first frame of the video; and
processing, by a machine learning model, the first frame and one or more previous frames of the video that precede the first frame to generate a new frame in which portions of the first frame have been blended into the region from which the depiction of the real-world object has been removed, the processing comprising:
generating a first plurality of features associated with the first frame and a second plurality of features associated with a previous frame of the one or more previous frames; and
combining a first subset of features of the first plurality of features and a second subset of features of the second plurality of features into a combined subset of features based on a weighted average of the first subset of features and the second subset of features to generate the new frame.
|