US 11,797,782 B2
Cross-lingual voice conversion system and method
Cevat Yerli, Frankfurt am Main (DE)
Assigned to TMRW Foundation IP S. À R.L., Luxembourg (LU)
Filed by TMRW Foundation IP S. À R.L., Luxembourg (LU)
Filed on Dec. 30, 2020, as Appl. No. 17/138,642.
Claims priority of provisional application 62/955,227, filed on Dec. 30, 2019.
Prior Publication US 2021/0200965 A1, Jul. 1, 2021
Int. Cl. G06F 40/58 (2020.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01)
CPC G06F 40/58 (2020.01) [G10L 15/02 (2013.01); G10L 15/063 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method performed by a machine learning system, the method comprising:
receiving, by a voice feature extractor, a first voice audio segment in a first language and a second voice audio segment in a second language;
extracting, by the voice feature extractor respectively from the first voice audio segment and second voice audio segment, audio features comprising first-voice, speaker-dependent acoustic features and second-voice, speaker-independent linguistic features;
generating, via a generator of a generative adversarial network (GAN) system from a trained data set, a third voice candidate having the first-voice, speaker-dependent acoustic features and the second-voice, speaker-independent linguistic features, wherein the third voice candidate speaks the second language translated based on the first language;
comparing, via one or more discriminators of the GAN system, the third voice candidate with ground truth data comprising the first-voice, speaker-dependent acoustic features and second-voice, speaker-independent linguistic features; and
providing results of the comparing step back to the generator for refining the third voice candidate.