US 12,080,272 B2
Attention-based clockwork hierarchical variational encoder
Robert Clark, Hertfordshire (GB); Chun-an Chan, Mountain View, CA (US); and Vincent Wan, London (GB)
Assigned to Google LLC, Mountain View, CA (US)
Appl. No. 17/756,264
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Dec. 10, 2019, PCT No. PCT/US2019/065566
§ 371(c)(1), (2) Date May 20, 2022,
PCT Pub. No. WO2021/118543, PCT Pub. Date Jun. 17, 2021.
Prior Publication US 2022/0415306 A1, Dec. 29, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 13/10 (2013.01); G10L 25/30 (2013.01)
CPC G10L 13/10 (2013.01) [G10L 25/30 (2013.01); G10L 2013/105 (2013.01)] 28 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, at data processing hardware, a text utterance having at least one word, each word having at least one syllable, each syllable having at least one phoneme;
selecting, by the data processing hardware, an utterance embedding for the text utterance, the utterance embedding representing an intended prosody; and
for each syllable, using the selected utterance embedding:
predicting, by the data processing hardware, a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable; and
generating, by the data processing hardware, a plurality of fixed-length predicted frames based on the predicted duration for the syllable.