CPC G10L 13/10 (2013.01) [G10L 13/06 (2013.01); G10L 25/18 (2013.01)] | 20 Claims |
1. A computer-implemented method, comprising:
receiving input audio data representing an utterance corresponding to a request to create requested synthesized speech;
processing the input audio data using a first component to determine first acoustic-feature data corresponding to a speaker of the utterance;
determining first data representing words corresponding to the requested synthesized speech;
processing the first data to determine second acoustic-feature data;
processing the first acoustic-feature data and the second acoustic-feature data to determine spectrogram data; and
processing the spectrogram data to determine output audio data representing synthesized speech of the words, the synthesized speech corresponding to the speaker.
|