| CPC G06V 10/761 (2022.01) [G06T 7/248 (2017.01); G06T 7/74 (2017.01); G06V 10/62 (2022.01); G06V 10/751 (2022.01); G06V 10/759 (2022.01); G06V 10/764 (2022.01); G06V 10/7715 (2022.01); G06V 10/82 (2022.01); G06V 40/20 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20084 (2013.01)] | 24 Claims |

|
1. An electronic device comprising:
a processor; and
a memory comprising instructions that when executed by the processor configure the processor to:
generate, for respective positions of a video feature map corresponding to an input video, a spatio-temporal self-similarity (STSS) tensor comprising a spatial self-similarity map and spatial cross-similarity maps, wherein the spatial self-similarity map comprises similarities between a position in a frame of the input video and neighboring regions in the frame, and the spatial cross-similarity maps comprise similarities between the position in the frame and respectively corresponding positions in other frames of the input video;
generate STSS feature vectors from the STSS tensor;
generate an STSS feature map based on the STSS feature vectors; and
perform an inference on the input video based on the STSS feature map.
|