US 11,868,888 B1
	Training a document classification neural network
Andrew M. Dai, San Francisco, CA (US); and Quoc V. Le, Sunnyvale, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 13, 2021, as Appl. No. 17/549,746.
Application 17/549,746 is a continuation of application No. 16/735,453, filed on Jan. 6, 2020, granted, now 11,200,492.
Application 16/735,453 is a continuation of application No. 15/257,539, filed on Sep. 6, 2016, granted, now 10,528,866, issued on Jan. 7, 2020.
Claims priority of provisional application 62/214,790, filed on Sep. 4, 2015.
Int. Cl. G06N 3/04 (2023.01); G06N 3/08 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)

CPC G06N 3/08 (2013.01) [G06N 3/044 (2023.01); G06N 3/045 (2023.01)]

20 Claims

1. A method for training a first neural network to perform a document processing task, wherein the first neural network comprises one or more initial neural network layers and a first output layer, wherein the one or more initial neural network layers have parameters, and wherein the method comprises:

training a language model neural network to predict missing text inputs in text sequences that each include a respective plurality of text inputs, wherein the language model neural network comprises the one or more initial neural network layers and a language model output layer, and wherein training the language model neural network comprises determining pre-trained values of the parameters of the one or more initial neural network layers from initial values of the parameters of the one or more initial neural network layers; and

training the first neural network on a plurality of training documents to determine trained values of the parameters of the one or more initial neural network layers from the pre-trained values of the parameters of the one or more initial neural network layers.