CPC G06V 20/41 (2022.01) [G06F 18/217 (2023.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06T 3/40 (2013.01); G06V 20/44 (2022.01)] | 21 Claims |
1. A method comprising:
obtaining a plurality of video frames of a video;
determining a resolution targeted for action classification for classifying each video frame of the plurality of video frames by analyzing each video frame using a policy network, wherein the policy network has a feature extractor and is trained to determine the resolution targeted to action classification;
rescaling, based on the determined resolution targeted for action classification, each video frame;
routing each rescaled video frame to a classifier of a backbone network, wherein the classifier routed to corresponds to the determined resolution;
classifying each rescaled video frame using the corresponding classifier of the backbone network to obtain a plurality of classifications; and
averaging the classifications to determine an action classification of the video.
|