| CPC G06F 18/2148 (2023.01) [G06F 18/2163 (2023.01); G06F 40/00 (2020.01)] | 20 Claims |

|
1. A method for training a neural network model, the method comprising:
dividing the neural network model stored in a memory into a held-out model and a main model;
during a first forward pass on the held-out model, determining, using a training dataset comprising words in a natural language, held-out model hidden states from attention heads of the held-out model;
determining, a first loss based on the first forward pass and a first backward pass on the held-out model;
during a second forward pass on the main model:
determining, using the training dataset, main model hidden states from attention heads of the main model;
concatenating the held-out model hidden states and the main model hidden states into concatenated hidden states; and
propagating the concatenated hidden states through a subset of layers of the main model, wherein the concatenated hidden states cause the main model to recognize language patterns different from the held-out model;
determining a second loss based on the second forward pass and a second backward pass on the main model;
updating parameters of the held-out model based on the first loss; and
updating parameters of the main model based on the second loss.
|