US 11,990,118 B2
	Text-to-speech (TTS) processing
Jaime Lorenzo Trueba, Cambridge (GB); Thomas Renaud Drugman, Carnieres (BE); Viacheslav Klimkov, Gdansk (PL); Srikanth Ronanki, Cambridge (GB); Thomas Edward Merritt, Cambridge (GB); Andrew Paul Breen, Norwich (GB); and Roberto Barra-Chicote, Cambridge (GB)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 6, 2023, as Appl. No. 18/206,301.
Application 18/206,301 is a continuation of application No. 17/882,691, filed on Aug. 8, 2022, granted, now 11,735,162.
Application 17/882,691 is a continuation of application No. 16/922,590, filed on Jul. 7, 2020, granted, now 11,410,639, issued on Aug. 9, 2022.
Application 16/922,590 is a continuation of application No. 16/141,241, filed on Sep. 25, 2018, granted, now 10,741,169, issued on Aug. 11, 2020.
Prior Publication US 2024/0013770 A1, Jan. 11, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 13/10 (2013.01); G10L 13/06 (2013.01); G10L 25/18 (2013.01)

CPC G10L 13/10 (2013.01) [G10L 13/06 (2013.01); G10L 25/18 (2013.01)]

20 Claims

1. A computer-implemented method, comprising:

receiving input audio data representing an utterance corresponding to a request to create requested synthesized speech;

processing the input audio data using a first component to determine first acoustic-feature data corresponding to a speaker of the utterance;

determining first data representing words corresponding to the requested synthesized speech;

processing the first data to determine second acoustic-feature data;

processing the first acoustic-feature data and the second acoustic-feature data to determine spectrogram data; and

processing the spectrogram data to determine output audio data representing synthesized speech of the words, the synthesized speech corresponding to the speaker.