CPC G06N 3/08 (2013.01) [G06N 3/044 (2023.01); G06N 3/045 (2023.01)] | 20 Claims |
1. A method for training a first neural network to perform a document processing task, wherein the first neural network comprises one or more initial neural network layers and a first output layer, wherein the one or more initial neural network layers have parameters, and wherein the method comprises:
training a language model neural network to predict missing text inputs in text sequences that each include a respective plurality of text inputs, wherein the language model neural network comprises the one or more initial neural network layers and a language model output layer, and wherein training the language model neural network comprises determining pre-trained values of the parameters of the one or more initial neural network layers from initial values of the parameters of the one or more initial neural network layers; and
training the first neural network on a plurality of training documents to determine trained values of the parameters of the one or more initial neural network layers from the pre-trained values of the parameters of the one or more initial neural network layers.
|