US 12,347,198 B2
Methods, systems, devices, media and products for video processing
Junfeng Wu, Beijing (CN); Song Bai, Singapore (SG); Yi Jiang, Beijing (CN); Wenqing Zhang, Beijing (CN); and Bin Lu, Los Angeles, CA (US)
Assigned to Beijing Youzhuju Network Technology Co., Ltd., Beijing (CN)
Filed by Beijing Youzhuju Network Technology Co., Ltd., Beijing (CN)
Filed on Jun. 3, 2024, as Appl. No. 18/732,477.
Application 18/732,477 is a continuation of application No. PCT/CN2022/130306, filed on Nov. 7, 2022.
Claims priority of application No. 202111531860.4 (CN), filed on Dec. 14, 2021.
Prior Publication US 2024/0320976 A1, Sep. 26, 2024
Int. Cl. G06V 20/40 (2022.01); G06V 10/764 (2022.01); G06V 10/77 (2022.01)
CPC G06V 20/46 (2022.01) [G06V 10/764 (2022.01); G06V 10/7715 (2022.01); G06V 20/49 (2022.01)] 18 Claims
OG exemplary drawing
 
1. A method for video processing, comprising:
extracting a plurality of feature maps from a plurality of frames of a video respectively;
determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level feature in each of the frames representing feature information of the video instance in the frame, wherein the plurality of frame-level features is determined by a plurality of connected processing layers iteratively, and wherein determining the plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps respectively comprises:
obtaining an initial video-level feature of the video instance;
at a first processing layer of the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the first processing layer based on the initial video-level feature and the plurality of feature maps;
at each subsequent processing layer after the first processing layer amongst the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the subsequent processing layer based on a plurality of intermediate frame-level features generated in a previous processing layer and the plurality of feature maps; and
determining a plurality of intermediate frame-level features generated at a last processing layer of the plurality of processing layers as the plurality of frame-level features;
determining a video-level feature of the video instance by aggregating the plurality of frame-level features, the video-level feature representing feature information of the video instance across the plurality of frames; and
determining an analysis result for the video instance in the plurality of frames based at least on the video-level feature.