CPC G10L 15/16 (2013.01) [G06N 3/02 (2013.01); G10L 15/02 (2013.01)] | 20 Claims |
1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising:
while a user is speaking a current utterance:
obtaining feature vectors indicative of audio characteristics of corresponding portions of the current utterance that have been spoken by the user;
for each predetermined duration of new audio received that characterizes a corresponding portion of the current utterance, calculating a corresponding i-vector;
providing, as input to a neural network acoustic model, the feature vectors and the corresponding i-vector calculated for each predetermined duration of new audio received; and
based on the feature vectors and the corresponding i-vector calculated for each predetermined duration of new audio received and provided as input to the neural network acoustic model, determining, as output from an output layer of the neural network acoustic model, a posterior probability distribution of possible speech units representing each feature vector.
|