US 12,266,179 B2
Efficient self-attention for video processing
Davide Abati, Amsterdam (NL); Amirhossein Habibian, Amsterdam (NL); and Amir Ghodrati, Amsterdam (NL)
Assigned to QUALCOMM INCORPORATED, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Mar. 16, 2022, as Appl. No. 17/696,797.
Claims priority of provisional application 63/162,478, filed on Mar. 17, 2021.
Prior Publication US 2022/0301311 A1, Sep. 22, 2022
Int. Cl. G06V 20/40 (2022.01); G06T 3/40 (2024.01); G06V 10/74 (2022.01); G06V 10/82 (2022.01)
CPC G06V 20/46 (2022.01) [G06T 3/40 (2013.01); G06V 10/761 (2022.01); G06V 10/82 (2022.01); G06V 20/48 (2022.01)] 24 Claims
OG exemplary drawing
 
1. A processor-implemented method for processing a video, performed by at least one processor, the processor implemented method comprising:
receiving the video as an input at an artificial neural network (ANN), the video comprising a sequence of frames;
extracting a set of features of a current frame of the video and a set of support features of a prior frame of the video, the set of support features comprising at least one support feature associated with a neighborhood of pixels of the prior frame, where the neighborhood of pixels of the prior frame is determined based on a position of a reference pixel of the current frame and where the neighborhood of pixels comprises a subset of pixels of the prior frame, the subset of pixels being less than an entirety of pixels of the prior frame;
computing a similarity metric between the at least one support feature, associated with the neighborhood of pixels, of the set of support features of the prior frame and a reference feature of the reference pixel of the current frame;
generating an attention map based on the similarity metric; and
generating an output comprising a reconstruction of the current frame based on the attention map.