US 12,230,258 B2
Sub-models for neural contextual biasing
Fadi Biadsy, Mountain View, CA (US); and Pedro J. Moreno Mengibar, Jersey City, NJ (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Apr. 19, 2022, as Appl. No. 17/659,836.
Prior Publication US 2023/0335122 A1, Oct. 19, 2023
Int. Cl. G10L 15/183 (2013.01); G06N 3/04 (2023.01)
CPC G10L 15/183 (2013.01) [G06N 3/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method when executed by data processing hardware causes the data processing hardware to perform operations comprising:
obtaining a base automatic speech recognition (ASR) model trained on non-biased data;
obtaining a sub-model trained on biased data, the biased data representative of a particular domain;
receiving a speech recognition request comprising audio data characterizing an utterance captured in streaming audio;
determining whether the speech recognition request includes a contextual indicator indicating the particular domain;
when the speech recognition request does not include the contextual indicator, generating, using the base ASR model, a first speech recognition result of the utterance by processing the audio data; and
when the speech recognition request includes the contextual indicator:
generating, using the base ASR model, an encoded output by processing the audio data;
biasing, using the sub-model, the base ASR model toward the particular domain;
generating, using the biased base ASR model, a sub-model output by processing the audio data, the sub-model output generated in parallel with the encoded output; and
generating, using a decoder of the base ASR model, a second speech recognition result of the utterance by processing the encoded output and the sub-model output, the second speech recognition result biased toward one or more terms in the particular domain.