US 12,481,834 B2
	Machine-learned language models which generate intermediate textual analysis in service of contextual text generation
Noam Shazeer, Palo Alto, CA (US); and Daniel De Freitas Adiwardana, Mountain View, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Mar. 13, 2024, as Appl. No. 18/603,756.
Application 18/603,756 is a continuation of application No. 18/164,216, filed on Feb. 3, 2023, granted, now 11,960,848.
Application 18/164,216 is a continuation of application No. 17/749,844, filed on May 20, 2022, granted, now 11,574,131, issued on Feb. 7, 2023.
Claims priority of provisional application 63/191,563, filed on May 21, 2021.
Prior Publication US 2024/0256786 A1, Aug. 1, 2024
Int. Cl. G06F 40/35 (2020.01); G06F 8/38 (2018.01); G06F 16/903 (2019.01); G06F 16/9032 (2019.01); G06F 16/9038 (2019.01); G06F 40/20 (2020.01); G06F 40/279 (2020.01); G06F 40/284 (2020.01); G06N 3/045 (2023.01); G06N 3/092 (2023.01); G06N 20/00 (2019.01); G10L 13/02 (2013.01)

CPC G06F 40/35 (2020.01) [G06F 8/38 (2013.01); G06F 16/90332 (2019.01); G06F 16/90335 (2019.01); G06F 16/9038 (2019.01); G06F 40/20 (2020.01); G06F 40/279 (2020.01); G06F 40/284 (2020.01); G06N 3/045 (2023.01); G06N 3/092 (2023.01); G06N 20/00 (2019.01); G10L 13/02 (2013.01)]

20 Claims

1. A computer-implemented method for training a machine-learned model to generate intermediate tokens for processing by an attention mechanism of the machine-learned model to improve subsequent outputs of the machine-learned model, the method comprising:

obtaining, by a computing system comprising one or more computing devices, a training example for training the machine-learned model to perform a contextual content generation task, wherein the training example comprises:

an initial portion comprising a context sequence;

an intermediate portion comprising an intermediate sequence; and

a response portion comprising a response sequence;

processing, by the computing system, the initial portion using the machine-learned model, wherein the machine-learned model uses the attention mechanism to perform attention over the initial portion;

generating, by the computing system and based on performing attention over the initial portion with the attention mechanism, one or more intermediate token predictions;

determining, by the computing system and using the intermediate sequence, one or more first loss values for the one or more intermediate token predictions;

generating, by the computing system and based on performing attention over the initial portion and the intermediate portion with the attention mechanism, one or more response token predictions;

determining, by the computing system and using the response sequence, one or more second loss values for the one or more response token predictions; and

training, by the computing system, the machine-learned model based on the one or more first loss values and the one or more second loss values.