CPC G10L 13/08 (2013.01) [G06F 3/0482 (2013.01); G10L 13/033 (2013.01); G10L 13/047 (2013.01); G10L 25/18 (2013.01)] | 20 Claims |
1. A method comprising:
storing a first embedding for a first speaker of a plurality of speakers and a second embedding for a second speaker of the plurality of speakers, the first embedding being associated with a first emotion and the second embedding being associated with a second emotion;
selecting, by one or more processors, the first speaker for virtual speech of one or more words of a text string;
generating, based on the first embedding of the first speaker and the second embedding of the second speaker, a third embedding for the first speaker associated with the second emotion; and
applying the third embedding of the first speaker and the text string to a vocoder to generate an audio stream comprising the one or more words being virtually spoken by the first speaker with the second emotion.
|