US 12,142,047 B1
Automated audio description system and method
Andrew H. Schwartz, Boston, MA (US); Michael Chalson, Boston, MA (US); Nathanael Beisiegel, Boston, MA (US); Daniel J. Caddigan, Arlington, MA (US); Roger S. Zimmerman, Falmouth, MA (US); Christopher S. Antunes, Winchester, MA (US); and Nicholas R. Moutis, Exeter, NH (US)
Assigned to 3Play Media, Inc., Boston, MA (US)
Filed by 3Play Media, Inc., Boston, MA (US)
Filed on May 31, 2024, as Appl. No. 18/680,449.
Int. Cl. G06V 20/40 (2022.01); G06F 40/166 (2020.01); G10L 13/02 (2013.01); G11B 27/10 (2006.01)
CPC G06V 20/47 (2022.01) [G06F 40/166 (2020.01); G06V 20/41 (2022.01); G06V 20/49 (2022.01); G10L 13/02 (2013.01); G11B 27/10 (2013.01)] 30 Claims
OG exemplary drawing
 
1. An audio description system comprising:
a memory storing source media comprising a plurality of frames positioned within the source media according to a time index; and
at least one processor coupled with the memory, the at least one processor configured to:
generate, using an image-to-text model, a textual description of each frame of the plurality of frames;
identify a plurality of intervals within the time index, each interval of the plurality of intervals encompassing one or more positions of one or more frames of the plurality of frames;
identify a plurality of placement periods within the time index, each placement period of the plurality of placement periods being temporally proximal to an interval of the plurality of intervals;
generate a summary description based on at least one textual description of at least one frame positioned within a selected interval temporally proximal to a placement period of the plurality of placement periods; and
associate the summary description with the placement period.