| CPC G10L 15/16 (2013.01) [G10L 15/063 (2013.01); G10L 15/183 (2013.01)] | 24 Claims |

|
1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising:
receiving context biasing data, the context biasing data comprising a set of unspoken textual utterances corresponding to a particular context, each unspoken textual utterance in the set of unspoken textual utterances not paired with any corresponding spoken utterance of speech;
obtaining a list of carrier phrases associated with the particular context of the set of unspoken textual utterances;
for each respective unspoken textual utterance in the set of unspoken textual utterances, generating a corresponding training data pair comprising the respective unspoken textual utterance paired with a carrier phrase from among the list of carrier phrases;
for each respective training data pair:
tokenizing the respective training data pair into a sequence of sub-word units;
generating, by a text encoder, at each of a plurality of output steps, a first higher order textual feature representation for a corresponding sub-word unit in the sequence of sub-word units tokenized from the respective training data pair;
receiving, as input to a first decoder of a speech recognition model, the first higher order textual feature representation generated by the text encoder at each of the plurality of output steps; and
generating, by the first decoder, at each of the plurality of output steps, a first probability distribution over possible text units; and
training the speech recognition model based on the first probability distribution over possible text units generated by the first decoder at each of the plurality of output steps for each respective training data pair.
|