CPC G10L 25/12 (2013.01) [G10L 15/16 (2013.01); G10L 15/187 (2013.01); G10L 25/30 (2013.01)] | 20 Claims |
1. A method for processing a speech signal, comprising:
obtaining a speech signal;
generating, using the speech signal, a first sequence of feature vectors;
selecting, from the first sequence of feature vectors, a second sequence of m*n consecutive feature vectors;
generating a third sequence of n intermediate vectors by applying the second sequence to a first neural network model, each intermediate vector corresponding to a subsequence of m consecutive feature vectors in the first sequence;
generating a fourth sequence of n average probability vectors by applying each of the n intermediate vectors to a corresponding second neural network model; and
determining the phones in the second sequence using the fourth sequence of n average probability vectors.
|