US 12,444,420 B2
Audiobook visual aid assistance
Yang Liang, Beijing (CN); Hamid Majdabadi, Ottawa (CA); Manjunath Ravi, Georgetown, TX (US); Ravithej Chikkala, Pflugerville, TX (US); and Su Liu, Austin, TX (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Apr. 27, 2023, as Appl. No. 18/308,166.
Prior Publication US 2024/0363116 A1, Oct. 31, 2024
Int. Cl. G10L 15/26 (2006.01); G10L 15/04 (2013.01)
CPC G10L 15/26 (2013.01) [G10L 15/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for automatically generating, correlating, and presenting visual content along with an audio stream associated with an audiobook, comprising:
in response to receiving the audio stream, automatically segmenting the audio stream into audio segments;
automatically identifying keywords, topics, and entities associated with each audio segment associated with the audio stream;
automatically identifying and retrieving visual content related to the identified keywords, topics, and entities of each audio segment from a visual content repository;
automatically integrating the identified and retrieved visual content with an audio segment, whereby integrating the identified and retrieved visual content with an audio segment further comprises identifying a display for the visual content on a computing device from a plurality of different displays associated a plurality of different computing devices, and automatically customizing and synchronizing presentation of the visual content on the display at a designated time correlated with audio content comprising the identified keywords, the topics, and the entities of the audio segment that relate to the visual content, wherein the plurality of different displays associated the plurality of different computing devices further comprises at least one display associated with a corresponding computing device that is different from a given computing device that is used for accessing and listening the audio stream; and
automatically generating and displaying the visual content on the display at the designated time during play of the audio stream.