| CPC G06V 20/46 (2022.01) [G06T 3/40 (2013.01); G06V 10/761 (2022.01); G06V 10/82 (2022.01); G06V 20/48 (2022.01)] | 24 Claims |

|
1. A processor-implemented method for processing a video, performed by at least one processor, the processor implemented method comprising:
receiving the video as an input at an artificial neural network (ANN), the video comprising a sequence of frames;
extracting a set of features of a current frame of the video and a set of support features of a prior frame of the video, the set of support features comprising at least one support feature associated with a neighborhood of pixels of the prior frame, where the neighborhood of pixels of the prior frame is determined based on a position of a reference pixel of the current frame and where the neighborhood of pixels comprises a subset of pixels of the prior frame, the subset of pixels being less than an entirety of pixels of the prior frame;
computing a similarity metric between the at least one support feature, associated with the neighborhood of pixels, of the set of support features of the prior frame and a reference feature of the reference pixel of the current frame;
generating an attention map based on the similarity metric; and
generating an output comprising a reconstruction of the current frame based on the attention map.
|