| CPC G06F 40/35 (2020.01) [G06F 8/38 (2013.01); G06F 16/90332 (2019.01); G06F 16/90335 (2019.01); G06F 16/9038 (2019.01); G06F 40/20 (2020.01); G06F 40/279 (2020.01); G06F 40/284 (2020.01); G06N 3/045 (2023.01); G06N 3/092 (2023.01); G06N 20/00 (2019.01); G10L 13/02 (2013.01)] | 20 Claims |

|
1. A computer-implemented method for training a machine-learned model to generate intermediate tokens for processing by an attention mechanism of the machine-learned model to improve subsequent outputs of the machine-learned model, the method comprising:
obtaining, by a computing system comprising one or more computing devices, a training example for training the machine-learned model to perform a contextual content generation task, wherein the training example comprises:
an initial portion comprising a context sequence;
an intermediate portion comprising an intermediate sequence; and
a response portion comprising a response sequence;
processing, by the computing system, the initial portion using the machine-learned model, wherein the machine-learned model uses the attention mechanism to perform attention over the initial portion;
generating, by the computing system and based on performing attention over the initial portion with the attention mechanism, one or more intermediate token predictions;
determining, by the computing system and using the intermediate sequence, one or more first loss values for the one or more intermediate token predictions;
generating, by the computing system and based on performing attention over the initial portion and the intermediate portion with the attention mechanism, one or more response token predictions;
determining, by the computing system and using the response sequence, one or more second loss values for the one or more response token predictions; and
training, by the computing system, the machine-learned model based on the one or more first loss values and the one or more second loss values.
|