CPC G10L 13/10 (2013.01) [G06N 20/00 (2019.01); G10L 17/04 (2013.01); G10L 21/013 (2013.01); G10L 25/63 (2013.01)] | 17 Claims |
1. A computer-implemented method of using a machine learning model for disentanglement of prosody in spoken natural language, the method comprising:
encoding, by a computing device, the spoken natural language to produce content code;
resampling, by the computing device without text transcriptions, the content code to obscure the prosody by applying an unsupervised technique to the machine learning model to generate prosody-obscured content code, the content code being resampled using a similarity-based random resampling technique configured for shortening, using similarity based down sampling, or lengthening, using similarity based up sampling, for resampling content code segments with a similarity above a prosody similarly threshold to shorten or lengthen, respectively, such that the content code segments are of equal length to each other to form the prosody-obscured content code; and
decoding, by the computing device, the prosody-obscured content code to synthesize speech indirectly based upon the content code.
|