| CPC G10L 15/183 (2013.01) [G06N 3/04 (2013.01)] | 20 Claims |

|
1. A computer-implemented method when executed by data processing hardware causes the data processing hardware to perform operations comprising:
obtaining a base automatic speech recognition (ASR) model trained on non-biased data;
obtaining a sub-model trained on biased data, the biased data representative of a particular domain;
receiving a speech recognition request comprising audio data characterizing an utterance captured in streaming audio;
determining whether the speech recognition request includes a contextual indicator indicating the particular domain;
when the speech recognition request does not include the contextual indicator, generating, using the base ASR model, a first speech recognition result of the utterance by processing the audio data; and
when the speech recognition request includes the contextual indicator:
generating, using the base ASR model, an encoded output by processing the audio data;
biasing, using the sub-model, the base ASR model toward the particular domain;
generating, using the biased base ASR model, a sub-model output by processing the audio data, the sub-model output generated in parallel with the encoded output; and
generating, using a decoder of the base ASR model, a second speech recognition result of the utterance by processing the encoded output and the sub-model output, the second speech recognition result biased toward one or more terms in the particular domain.
|