US 11,915,682 B2
	Speech synthesis utilizing audio waveform difference signal(s)
Luis Carlos Cobo Rus, San Francisco, CA (US); Nal Kalchbrenner, Amsterdam (NL); Erich Elsen, Naperville, IL (US); and Chenjie Gu, Sunnyvale, CA (US)
Assigned to DeepMind Technologies Limited, London (GB)
Appl. No. 17/610,934
Filed by DeepMind Technologies Limited, London (GB)
PCT Filed May 20, 2019, PCT No. PCT/US2019/033104 § 371(c)(1), (2) Date Nov. 12, 2021, PCT Pub. No. WO2020/231449, PCT Pub. Date Nov. 19, 2020.
Claims priority of provisional application 62/848,314, filed on May 15, 2019.
Prior Publication US 2022/0254330 A1, Aug. 11, 2022
Int. Cl. G10L 13/00 (2006.01); G10L 19/00 (2013.01); G10L 13/047 (2013.01); G10L 13/08 (2013.01); G10L 25/30 (2013.01)

CPC G10L 13/047 (2013.01) [G10L 13/08 (2013.01); G10L 25/30 (2013.01)]

24 Claims

1. A method implemented by one or more processors, the method comprising:

iteratively generating samples of an audio waveform that is synthesized speech of provided text, wherein generating the samples of the audio waveform comprises:

at each iteration of a plurality of sequential iterations:

generating a respective difference signal for the iteration using an autoregressive model, wherein the respective difference signal is a predicted difference based on an amplitude of a respective preceding sample of the audio waveform generated in an immediately preceding iteration and an amplitude of a respective sample for the iteration, wherein an input to the autoregressive model comprises:

a respective representation of at least part of the provided text,

the respective preceding sample of the audio waveform generated in the immediately preceding iteration of the sequential iterations, and

a respective preceding difference signal generated in the immediately preceding iteration; and

determining the respective sample for the iteration using the respective difference signal for the iteration and the respective preceding sample of the audio waveform generated in the immediately preceding iteration, the respective sample for the iteration being one of the samples of the audio waveform; and

causing a client device to render the audio waveform by rendering the samples of the audio waveform.