US 11,727,913 B2
Automatically associating context-based sounds with text
Gaurav Verma, Bangalore (IN); Vishwa Vinay, Bangalore (IN); Sneha Chowdary Vinjam, Hyderabad (IN); Siddharth Sahay, Secunderabad (IN); and Mitansh Jain, Delhi (IN)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Dec. 23, 2019, as Appl. No. 16/725,716.
Prior Publication US 2021/0193109 A1, Jun. 24, 2021
Int. Cl. G10L 15/16 (2006.01); G10L 13/033 (2013.01); G10L 13/00 (2006.01); G06F 16/35 (2019.01); G10L 13/08 (2013.01); G06F 3/0482 (2013.01); G06F 3/16 (2006.01); G10L 13/047 (2013.01)
CPC G10L 13/00 (2013.01) [G06F 3/0482 (2013.01); G06F 3/167 (2013.01); G06F 16/358 (2019.01); G10L 13/047 (2013.01); G10L 13/08 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method implemented by at least one processing device, the method comprising:
receiving digital text;
automatically identifying, using a text classification module of a multimodal classification module trained based on texts and sounds, an aurally active word in the digital text;
automatically identifying multiple context-based sounds corresponding to the aurally active word in the digital text using a sound classification module implemented in a deep neural network trained to identify one or more sound tags;
identifying multiple context-based sound identifiers based on the one or more sound tags, each context-based sound identifier being associated with one of the multiple context-based sounds;
displaying the digital text and the multiple context-based sound identifiers;
receiving user selection of a context-based sound of the multiple context-based sounds; and
presenting the digital text concurrently with the context-based sound including audibly outputting the context-based sound at a higher volume during a time that the aurally active word is determined to be read than during times that the aurally active word is not determined to be read, wherein the higher volume of the context-based sound is based on an attention weight generated by the text classification module for the aurally active word to which the context-based sound is anchored.