CPC G06V 20/41 (2022.01) [G06F 18/214 (2023.01)] | 27 Claims |
1. A method comprising:
obtaining, using at least one processor, audio/video content; and
processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content;
wherein processing the audio/video content comprises, using the trained attention-based machine learning model:
generating a global representation of the audio/video content based on the audio/video content;
generating a local representation of the audio/video content based on different portions of the audio/video content; and
combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content, the audio/video content classified based on the output representation.
|