US 12,266,175 B2
Combining visual and audio insights to detect opening scenes in multimedia files
Yonit Hoffman, Herzliya (IL); Mordechai Kadosh, Raanana (IL); Zvi Figov, Modiin (IL); Eliyahu Strugo, Tel Aviv (IL); Mattan Serry, Herzliya (IL); and Michael Ben-Haym, Modiin-Maccabim reut (IL)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed on Dec. 29, 2022, as Appl. No. 18/090,843.
Prior Publication US 2024/0221379 A1, Jul. 4, 2024
Int. Cl. H04N 5/932 (2006.01); G06V 10/70 (2022.01); G06V 20/40 (2022.01); G06V 20/62 (2022.01); G06V 30/244 (2022.01); G10L 15/00 (2013.01); G10L 15/26 (2006.01); G10L 25/48 (2013.01); G11B 27/10 (2006.01)
CPC G06V 20/41 (2022.01) [G06V 10/70 (2022.01); G06V 20/46 (2022.01); G06V 20/49 (2022.01); G06V 20/62 (2022.01); G06V 30/245 (2022.01); G10L 15/005 (2013.01); G10L 15/26 (2013.01); G10L 25/48 (2013.01); G11B 27/102 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A method implemented by a computing system for training a machine learning model to classify scenes in multimedia content, the method comprising:
identifying a scene in the multimedia content of a media file;
identifying a feature associated with the scene;
scoring the scene for a probability that the scene corresponds to an opening song based on a classification weight of the model that is assigned to the feature;
based at least in part on the probability that the scene corresponds to the opening song, classifying the scene as correlating to the opening song, or alternatively, classifying the scene as not correlating to the opening song; and
based on the classification for the scene, modifying a classification weight of the machine learning model.