| CPC G06F 8/35 (2013.01) [G06F 40/284 (2020.01); G06F 40/30 (2020.01)] | 20 Claims |

|
1. A method comprising:
adjusting a token list to include a language token used by a tokenizer for a pretrained language model,
wherein the pretrained language model comprises a set of layers,
wherein the set of layers comprises a set of initial layers, an embedding layer, and an output layer, and
wherein the output layer generates an output vector from an embedding vector generated by the embedding layer;
performing an output layer modification of the output layer to replace the output vector with the embedding vector;
freezing the set of initial layers to generate a set of frozen layers of the pretrained language model that do not update during training; and
training the pretrained language model using the language token, the output layer modification, and the set of frozen layers to form a fine-tuned model from the pretrained language model, wherein training the pretrained language model comprises backpropagating a difference between a training output vector and an expected vector to a set of end layers of the pretrained language model to form the fine-tuned model.
|