US 11,869,482 B2
Speech waveform generation
Yang Cui, Redmond, WA (US); Xi Wang, Redmond, WA (US); Lei He, Redmond, WA (US); and Kao-Ping Soong, Redmond, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Appl. No. 17/272,325
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
PCT Filed Sep. 30, 2018, PCT No. PCT/CN2018/109044
§ 371(c)(1), (2) Date Feb. 28, 2021,
PCT Pub. No. WO2020/062217, PCT Pub. Date Apr. 2, 2020.
Prior Publication US 2021/0193112 A1, Jun. 24, 2021
Int. Cl. G10L 13/047 (2013.01)
CPC G10L 13/047 (2013.01) 18 Claims
OG exemplary drawing
 
1. A method for generating a speech waveform, comprising:
receiving fundamental frequency information, glottal features and vocal tract features associated with an input, wherein the glottal features include a phase feature, a shape feature, and an energy feature, the phase feature being represented by phase vectors, the shape feature being represented by shape vectors, and the energy feature being represented by energy vectors;
generating a glottal waveform based on the fundamental frequency information and the glottal features through a first neural network model, wherein the generating the glottal waveform further comprises:
forming a phase matrix from the phase vectors;
constructing a phase-based weighting matrix by converting the phase matrix nonlinearly through a first part of the first neural network model;
generating a characteristic waveform feature based on the fundamental frequency information, the shape vectors and the energy vectors through a second part of the first neural network model; and
obtaining the glottal waveform based on the phase-based weighting matrix and the characteristic waveform feature; and
generating a speech waveform based on the glottal waveform and the vocal tract features through a second neural network model.