US 12,192,600 B2
Multimedia scene break detection
Amir Mazaheri, Mountain View, CA (US); John Trenkle, Albany, CA (US); Jaya Kawale, San Jose, CA (US); Kevin Edward Corcoran, Media, PA (US); Dennis Paul Yost, Salado, TX (US); Anthony Albert Truyoo-Broque, Manhattan Beach, CA (US); and Matthew Adam Elliott, Portland, OR (US)
Assigned to Tubi, Inc., San Francisco, CA (US)
Filed by Tubi, Inc., San Francisco, CA (US)
Filed on Apr. 17, 2023, as Appl. No. 18/301,971.
Prior Publication US 2024/0357217 A1, Oct. 24, 2024
Int. Cl. H04N 21/234 (2011.01); G11B 27/19 (2006.01); H04N 21/233 (2011.01); H04N 21/236 (2011.01); H04N 21/262 (2011.01); H04N 21/433 (2011.01); H04N 21/4545 (2011.01); H04N 21/482 (2011.01); H04N 21/81 (2011.01); H04N 21/845 (2011.01); H04N 21/8549 (2011.01)
CPC H04N 21/8549 (2013.01) [G11B 27/19 (2013.01); H04N 21/233 (2013.01); H04N 21/23418 (2013.01); H04N 21/812 (2013.01); H04N 21/8456 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for computer vision analysis of a media item, comprising:
a computer processor;
a scene break detection service executing on the computer processor and comprising functionality to:
receive a request for scene break detection on the media item;
perform audio break detection on an audio component of the media item to obtain a set of audio break timestamps corresponding to aurally similar segments of the audio component;
identify a set of video break timestamps, each corresponding to at least one frame of a video component of the media item;
identify a set of candidate scene break timestamps corresponding to instances of the set of the audio break timestamps and the set of video break timestamps within a predefined proximity;
execute a computer vision scoring model for each candidate scene break timestamp of the set of candidate scene break timestamps by:
identifying a first subset of the set of contiguous shots preceding the candidate scene break timestamp and a second subset of the set of contiguous shots succeeding the candidate scene break timestamp;
calculating a score for the candidate scene break timestamp representing a visual distance between the first subset of contiguous shots and the second subset of contiguous shots; and
select, based at least on the score of each of the set of candidate scene break timestamps, a final set of scene break timestamps for performing a media action.