US 12,361,215 B2
	Performing machine learning tasks using instruction-tuned neural networks
Jason Weng Wei, Mountain View, CA (US); Maarten Paul Bosma, Mountain View, CA (US); Yuzhe Zhao, Jr., San Francisco, CA (US); Kelvin Gu, Mountain View, CA (US); and Quoc V. Le, Sunnyvale, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 23, 2021, as Appl. No. 17/561,581.
Prior Publication US 2023/0205994 A1, Jun. 29, 2023
Int. Cl. G06F 40/284 (2020.01); G06F 40/30 (2020.01)

CPC G06F 40/284 (2020.01) [G06F 40/30 (2020.01)]

20 Claims

1. A method performed by one or more computers, the method comprising:

receiving, by the one or more computers, input data that describes an input of a first machine learning task;

receiving, by the one or more computers, first candidate output data that includes tokens representing each candidate classification output in a first plurality of candidate classification outputs of the first machine learning task for the input;

generating, by the one or more computers, an input sequence that includes the input and the first plurality of candidate classification outputs;

processing, by the one or more computers, the input sequence using an auto-regressive neural network to generate a network output that specifies a respective score for each candidate classification output in the first plurality of candidate classification outputs;

generating, by the one or more computers, a first output of the first machine learning task for the input, wherein generating the first output comprises using the respective scores specified by the network output generated by the auto-regressive neural network to select, as the first output, a selected candidate classification output from the first plurality of candidate classification outputs that are represented by the tokens included in the first candidate output data;

receiving, by the one or more computers, input data that describes another input of a second machine learning task;

receiving, by the one or more computers, second candidate output data that includes tokens representing each candidate classification output in a second plurality of candidate classification outputs of the second machine learning task for the other input, wherein the second plurality of candidate classification outputs comprise at least one different candidate classification output than the first plurality of candidate classification outputs;

generating, by the one or more computers, another input sequence that includes the other input and the second plurality of candidate classification outputs;

processing, by the one or more computers, the other input sequence using the auto-regressive neural network to generate another network output that specifies a respective score for each candidate classification output in the second plurality of candidate classification outputs; and

generating, by the one or more computers, a second output of the second machine learning task for the other input, wherein generating the second output comprises using the respective scores specified by the other network output generated by the auto-regressive neural network to select, as the second output, a selected candidate classification output from the second plurality of candidate classification outputs that are represented by the tokens included in the second candidate output data.