CPC G10L 15/063 (2013.01) [G10L 15/02 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01)] | 20 Claims |
1. Memory hardware storing instructions that, when executed by data processing hardware, cause the data processing hardware to implement an automated speech recognition (ASR) model, the ASR model comprising:
a first encoder configured to:
receive, as input, a sequence of acoustic frames corresponding to an utterance; and
generate, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames;
a second encoder configured to:
receive, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps; and
generate, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame;
a decoder configured to:
receive, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps; and
generate, at each of the plurality of output steps, a first probability distribution over possible speech recognition hypotheses; and
a language model configured to:
receive, as input, the first probability distribution over possible speech hypotheses; and
generate, at each of the plurality of output steps, a rescored probability distribution over possible speech recognition hypotheses to generate a transcription for the utterance.
|