US 11,989,939 B2
System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training
Saurabh Sahu, San Bruno, CA (US); and Palash Goyal, San Jose, CA (US)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Jul. 28, 2021, as Appl. No. 17/387,889.
Claims priority of provisional application 63/162,315, filed on Mar. 17, 2021.
Prior Publication US 2022/0300740 A1, Sep. 22, 2022
Int. Cl. G06V 20/40 (2022.01); G06F 18/214 (2023.01)
CPC G06V 20/41 (2022.01) [G06F 18/214 (2023.01)] 27 Claims
OG exemplary drawing
 
1. A method comprising:
obtaining, using at least one processor, audio/video content; and
processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content;
wherein processing the audio/video content comprises, using the trained attention-based machine learning model:
generating a global representation of the audio/video content based on the audio/video content;
generating a local representation of the audio/video content based on different portions of the audio/video content; and
combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content, the audio/video content classified based on the output representation.