| CPC G10L 13/00 (2013.01) [G10L 17/02 (2013.01)] | 19 Claims |

|
1. A method comprising:
determining, based at least on identification data corresponding to a speaker, an identity embedding associated with the speaker;
activating, based at least on the identity embedding, one or more adapters, from a plurality of adapters included in a text-to-speech (TTS) machine learning model, that correspond to the speaker, wherein each of the plurality of adapters is trained using speaker-specific training data separately from fixed components of the TTS machine learning model;
processing, using the TTS machine learning model including the one or more activated adapters, a textual input to generating a speech representation corresponding to the speaker; and
causing output of audio corresponding to the speech representation.
|