CPC G06T 5/003 (2013.01) [G06N 3/045 (2023.01); G06V 20/46 (2022.01)] | 18 Claims |
1. A computer-implemented method comprising:
receiving an input frame from a digital video;
providing a combined input to a neural network, the combined input including the input frame, a previous input frame, and a corresponding previous output frame;
extracting a plurality of combined features of the combined input using an encoder network;
determining, using the neural network, a plurality of spatial alignment kernels and a plurality of deblur kernels each corresponding to a feature of the plurality of combined features, wherein the plurality of spatial alignment kernels include different sizes of spatial alignment kernels and wherein the plurality of deblur kernels include different sizes of deblur kernels;
padding the plurality of deblur kernels such that each of the plurality of deblur kernels is of equal size;
averaging the plurality of padded deblur kernels;
convolving the average of the plurality of padded deblur kernels with a plurality of features of the input frame to obtain a first convolved result;
generating, by the neural network, a plurality of output features for the input frame using the plurality of spatial alignment kernels and the first convolved result; and
generating a deblurred output frame from the plurality of output features using a decoder network.
|