| CPC G10L 15/197 (2013.01) [G10L 15/005 (2013.01); G10L 15/16 (2013.01); G10L 15/22 (2013.01)] | 20 Claims |

|
1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising:
receiving transcribed audio training data comprising training audio data corresponding to an utterance paired with a ground-truth transcription of the utterance;
during a first pass, processing, using a speech recognition model, the training audio data to generate N candidate hypotheses for the utterance, each corresponding candidate hypothesis among the N candidate hypotheses having a respective first pass score;
during a second pass, for each corresponding candidate hypothesis of the N candidate hypotheses:
generating, using a neural network rescoring model, a respective second pass score based on the respective first pass score for the corresponding candidate hypothesis; and
applying a Softmax function to a respective negative edit-distance between the corresponding candidate hypothesis and the ground-truth transcription; and
optimizing model parameters of the neural network rescoring model based on the Softmax function applied to the respective negative edit-distance between the ground-truth transcription and each corresponding candidate hypothesis among the N candidate hypotheses.
|