| CPC G06V 20/46 (2022.01) [G06V 10/764 (2022.01); G06V 10/7715 (2022.01); G06V 20/49 (2022.01)] | 18 Claims |

|
1. A method for video processing, comprising:
extracting a plurality of feature maps from a plurality of frames of a video respectively;
determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level feature in each of the frames representing feature information of the video instance in the frame, wherein the plurality of frame-level features is determined by a plurality of connected processing layers iteratively, and wherein determining the plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps respectively comprises:
obtaining an initial video-level feature of the video instance;
at a first processing layer of the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the first processing layer based on the initial video-level feature and the plurality of feature maps;
at each subsequent processing layer after the first processing layer amongst the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the subsequent processing layer based on a plurality of intermediate frame-level features generated in a previous processing layer and the plurality of feature maps; and
determining a plurality of intermediate frame-level features generated at a last processing layer of the plurality of processing layers as the plurality of frame-level features;
determining a video-level feature of the video instance by aggregating the plurality of frame-level features, the video-level feature representing feature information of the video instance across the plurality of frames; and
determining an analysis result for the video instance in the plurality of frames based at least on the video-level feature.
|