| CPC G10L 13/08 (2013.01) [G06F 3/0482 (2013.01); G10L 13/033 (2013.01); G10L 13/047 (2013.01); G10L 25/18 (2013.01)] | 20 Claims | 

| 
               1. A method comprising: 
            storing a first embedding for a first speaker of a plurality of speakers and a second embedding for a second speaker of the plurality of speakers, the first embedding being associated with a first emotion and the second embedding being associated with a second emotion; 
                selecting, by one or more processors, the first speaker for virtual speech of one or more words of a text string; 
                generating, based on the first embedding of the first speaker and the second embedding of the second speaker, a third embedding for the first speaker associated with the second emotion; and 
                applying the third embedding of the first speaker and the text string to a vocoder to generate an audio stream comprising the one or more words being virtually spoken by the first speaker with the second emotion. 
               |