| CPC G06N 3/088 (2013.01) [G06F 40/284 (2020.01); G06N 3/045 (2023.01)] | 31 Claims |

|
1. A computing system for training a machine-learned model, the computing system comprising:
one or more processors; and
one or more non-transitory computer-readable media that collectively store:
a first language model comprising one or more transformer layers, wherein the first language model includes a plurality of first language model parameters, wherein each first language model parameter of the plurality of first language model parameters is associated with at least one transformer layer of the one or more transformer layers of the first language model;
a second language model comprising one or more transformer layers, wherein the second language model includes a plurality of second language model parameters, wherein each second language model parameter of the plurality of second language model parameters is associated with at least one transformer layer of the one or more transformer layers of the second language model, wherein the one or more transformer layers of the second language model are of a different dimension than the one or more transformer layers of the first language model, and
instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising:
projecting the first language model parameters into a shared space with the second language model parameters; and
training the second language model using a loss function based on a comparison of the projected first language model parameters and the second language model parameters.
|