CPC G06V 20/47 (2022.01) [G06F 40/166 (2020.01); G06V 20/41 (2022.01); G06V 20/49 (2022.01); G10L 13/02 (2013.01); G11B 27/10 (2013.01)] | 30 Claims |
1. An audio description system comprising:
a memory storing source media comprising a plurality of frames positioned within the source media according to a time index; and
at least one processor coupled with the memory, the at least one processor configured to:
generate, using an image-to-text model, a textual description of each frame of the plurality of frames;
identify a plurality of intervals within the time index, each interval of the plurality of intervals encompassing one or more positions of one or more frames of the plurality of frames;
identify a plurality of placement periods within the time index, each placement period of the plurality of placement periods being temporally proximal to an interval of the plurality of intervals;
generate a summary description based on at least one textual description of at least one frame positioned within a selected interval temporally proximal to a placement period of the plurality of placement periods; and
associate the summary description with the placement period.
|