US 12,142,257 B2
Emotion-based text to speech
Liron Harazi, Elad (IL); Jacob Assa, New York, NY (US); and Alan Bekker, Givaat Shmuel (IL)
Assigned to SNAP INC., Santa Monica, CA (US)
Filed by Snap Inc., Santa Monica, CA (US)
Filed on Feb. 8, 2022, as Appl. No. 17/667,128.
Prior Publication US 2023/0252972 A1, Aug. 10, 2023
Int. Cl. G10L 13/08 (2013.01); G06F 3/0482 (2013.01); G10L 13/033 (2013.01); G10L 13/047 (2013.01); G10L 25/18 (2013.01)
CPC G10L 13/08 (2013.01) [G06F 3/0482 (2013.01); G10L 13/033 (2013.01); G10L 13/047 (2013.01); G10L 25/18 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
storing a first embedding for a first speaker of a plurality of speakers and a second embedding for a second speaker of the plurality of speakers, the first embedding being associated with a first emotion and the second embedding being associated with a second emotion;
selecting, by one or more processors, the first speaker for virtual speech of one or more words of a text string;
generating, based on the first embedding of the first speaker and the second embedding of the second speaker, a third embedding for the first speaker associated with the second emotion; and
applying the third embedding of the first speaker and the text string to a vocoder to generate an audio stream comprising the one or more words being virtually spoken by the first speaker with the second emotion.