CPC G06V 20/40 (2022.01) [G06F 18/214 (2023.01); G06F 18/24317 (2023.01); G06V 20/44 (2022.01)] | 20 Claims |
1. A computer-implemented method for temporally localizing a target action in a video, comprising:
inputting a video into a machine-learned model comprising one or more weakly supervised temporal action localization models;
analyzing the video by the one or more weakly-supervised temporal action localization models to determine one or more weighted temporal class activation maps; and
determining a temporal location of a target action in the video based at least in part on the one or more weighted temporal class activation maps;
wherein the machine-learned model comprises a sparse temporal pooling network comprising a first weakly supervised temporal action localization model and a second weakly supervised temporal action localization model.
|