US 11,954,910 B2
Dynamic multi-resolution processing for video classification
Rameswar Panda, Medford, MA (US); Yue Meng, Cambridge, MA (US); Chung-Ching Lin, White Plains, NY (US); Rogerio Schmidt Feris, West Hartford, CT (US); and Aude Jeanne Oliva, Cambridge, MA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US); and MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MA, Cambridge, MA (US)
Filed by International Business Machines Corporation, Armonk, NY (US); and Massachusetts Institute of Technology, Cambridge, MA (US)
Filed on Dec. 26, 2020, as Appl. No. 17/134,315.
Prior Publication US 2022/0215198 A1, Jul. 7, 2022
Int. Cl. G06V 20/40 (2022.01); G06F 18/21 (2023.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01); G06T 3/40 (2006.01)
CPC G06V 20/41 (2022.01) [G06F 18/217 (2023.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06T 3/40 (2013.01); G06V 20/44 (2022.01)] 21 Claims
OG exemplary drawing
 
1. A method comprising:
obtaining a plurality of video frames of a video;
determining a resolution targeted for action classification for classifying each video frame of the plurality of video frames by analyzing each video frame using a policy network, wherein the policy network has a feature extractor and is trained to determine the resolution targeted to action classification;
rescaling, based on the determined resolution targeted for action classification, each video frame;
routing each rescaled video frame to a classifier of a backbone network, wherein the classifier routed to corresponds to the determined resolution;
classifying each rescaled video frame using the corresponding classifier of the backbone network to obtain a plurality of classifications; and
averaging the classifications to determine an action classification of the video.