| CPC G10L 15/083 (2013.01) [G06N 3/04 (2013.01); G10L 15/16 (2013.01); G10L 15/187 (2013.01); G10L 15/26 (2013.01); G10L 2015/088 (2013.01)] | 22 Claims |

|
1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations, the operations comprising:
receiving audio data corresponding to an utterance spoken by a user and captured by a user device;
processing, using a speech recognizer, the audio data to determine a candidate transcription for the spoken utterance, the candidate transcription comprises a sequence of tokens;
for each corresponding token in the sequence of tokens subsequent to an initial token in the sequence of tokens:
determining, using a first embedding table, a token embedding for the corresponding token, the token embedding determined independent from each token in the sequence of tokens that precedes the corresponding token in the sequence of tokens;
determining, using a second embedding table, a n-gram token embedding for a previous sequence of n-gram tokens, based on each token in the sequence of tokens that precedes the corresponding token in the sequence of tokens; and
concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token; and
rescoring, using an external language model, the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.
|