US 11,929,058 B2
Systems and methods for adapting human speaker embeddings in speech synthesis
Cong Zhou, Fremont, CA (US); Xiaoyu Liu, San Mateo, CA (US); Michael Getty Horgan, San Francisco, CA (US); and Vivek Kumar, Foster City, CA (US)
Assigned to DOLBY LABORATORIES LICENSING CORPORATION, San Francisco, CA (US)
Appl. No. 17/636,851
Filed by DOLBY LABORATORIES LICENSING CORPORATION, San Francisco, CA (US)
PCT Filed Aug. 18, 2020, PCT No. PCT/US2020/046723
§ 371(c)(1), (2) Date Feb. 18, 2022,
PCT Pub. No. WO2021/034786, PCT Pub. Date Feb. 25, 2021.
Claims priority of provisional application 63/023,673, filed on May 12, 2020.
Claims priority of provisional application 62/889,675, filed on Aug. 21, 2019.
Prior Publication US 2022/0335925 A1, Oct. 20, 2022
Int. Cl. G10L 21/00 (2013.01); G10L 13/00 (2006.01); G10L 13/033 (2013.01); G10L 13/047 (2013.01); G10L 13/08 (2013.01); G10L 17/12 (2013.01)
CPC G10L 13/033 (2013.01) [G10L 13/047 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method to synthesize a voice in a target style, comprising:
receiving as input at least one waveform, each corresponding to an utterance in the target style;
extracting features on the at least one waveform and generating at least one embedding vector from the extracted features;
calculating vector distances on an embedding vector of the at least one embedding vector to determine embedding vector distances to each of a plurality of known embedding vectors;
determining a known embedding vector of the known embedding vectors with a shortest distance from the embedding vector;
designating the known embedding vector as an initial embedding vector for a speech synthesizer;
adapting the speech synthesizer based on the initial embedding vector; and synthesizing a voice in the target style with the adapted speech synthesizer.