US 12,412,566 B2
	Lookup-table recurrent language model
Ronny Huang, Mountain View, CA (US); Tara N. Sainath, Jersey City, NJ (US); Trevor Strohman, Mountain View, CA (US); and Shankar Kumar, Mountain View, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Feb. 10, 2022, as Appl. No. 17/650,566.
Claims priority of provisional application 63/165,725, filed on Mar. 24, 2021.
Prior Publication US 2022/0310067 A1, Sep. 29, 2022
Int. Cl. G10L 15/16 (2006.01); G06F 40/30 (2020.01); G06N 3/04 (2023.01); G10L 15/08 (2006.01); G10L 15/187 (2013.01); G10L 15/26 (2006.01)

CPC G10L 15/083 (2013.01) [G06N 3/04 (2013.01); G10L 15/16 (2013.01); G10L 15/187 (2013.01); G10L 15/26 (2013.01); G10L 2015/088 (2013.01)]

22 Claims

1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations, the operations comprising:

receiving audio data corresponding to an utterance spoken by a user and captured by a user device;

processing, using a speech recognizer, the audio data to determine a candidate transcription for the spoken utterance, the candidate transcription comprises a sequence of tokens;

for each corresponding token in the sequence of tokens subsequent to an initial token in the sequence of tokens:

determining, using a first embedding table, a token embedding for the corresponding token, the token embedding determined independent from each token in the sequence of tokens that precedes the corresponding token in the sequence of tokens;

determining, using a second embedding table, a n-gram token embedding for a previous sequence of n-gram tokens, based on each token in the sequence of tokens that precedes the corresponding token in the sequence of tokens; and

concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token; and

rescoring, using an external language model, the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.