CPC G10L 13/047 (2013.01) | 18 Claims |
1. A method for generating a speech waveform, comprising:
receiving fundamental frequency information, glottal features and vocal tract features associated with an input, wherein the glottal features include a phase feature, a shape feature, and an energy feature, the phase feature being represented by phase vectors, the shape feature being represented by shape vectors, and the energy feature being represented by energy vectors;
generating a glottal waveform based on the fundamental frequency information and the glottal features through a first neural network model, wherein the generating the glottal waveform further comprises:
forming a phase matrix from the phase vectors;
constructing a phase-based weighting matrix by converting the phase matrix nonlinearly through a first part of the first neural network model;
generating a characteristic waveform feature based on the fundamental frequency information, the shape vectors and the energy vectors through a second part of the first neural network model; and
obtaining the glottal waveform based on the phase-based weighting matrix and the characteristic waveform feature; and
generating a speech waveform based on the glottal waveform and the vocal tract features through a second neural network model.
|