US 11,055,828 B2
Video inpainting with deep internal learning
Mai Long, San Jose, CA (US); Zhaowen Wang, San Jose, CA (US); Ning Xu, Milpitas, CA (US); John Philip Collomosse, Surrey (GB); Haotian Zhang, Stanford, CA (US); and Hailin Jin, San Jose, CA (US)
Assigned to ADOBE INC., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on May 9, 2019, as Appl. No. 16/407,915.
Prior Publication US 2020/0357099 A1, Nov. 12, 2020
Int. Cl. G06T 5/00 (2006.01); G06N 3/04 (2006.01); G06K 9/62 (2006.01)
CPC G06T 5/005 (2013.01) [G06K 9/6256 (2013.01); G06N 3/04 (2013.01); G06T 5/002 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving, by processing circuitry of a computer configured to perform video inpainting, initial video data representing a sequence of initial frames;
generating, by the processing circuitry, a sequence of inputs, each of the sequence of inputs corresponding to a respective initial frame of the sequence of initial frames and having respective input values, wherein each of the sequence of initial frames includes a content region and a mask region, the content region including target image values;
generating, by the processing circuitry, a neural network including an initial layer and a final layer, the input values of each of the sequence of inputs representing the initial layer of the neural network;
performing, by the processing circuitry, a training operation on the neural network to produce final video data including a sequence of final frames representing the final layer of the neural network and a plurality of optical flows, wherein the training operation is based on a loss function including a weighted sum of a plurality of components, the plurality of components including an image generation loss function based on differences between estimated image values of the sequence of final frames and the target image values in the content region of the sequence of initial frames, and wherein performing the training operation on the neural network includes performing a minimization operation on the loss function to produce a set of parameters of the neural network that minimizes the loss function; and
generating, by the processing circuitry, video content based on the final video data, the final video data comprising an inpainted version of the initial video data.