US 12,354,004 B2
	Generating vector representations of documents
Quoc V. Le, Sunnyvale, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 22, 2023, as Appl. No. 18/395,172.
Application 18/395,172 is a continuation of application No. 16/523,766, filed on Jul. 26, 2019, granted, now 11,853,879.
Application 16/523,766 is a continuation of application No. 14/609,869, filed on Jan. 30, 2015, granted, now 10,366,327, issued on Jul. 30, 2019.
Claims priority of provisional application 61/934,674, filed on Jan. 31, 2014.
Prior Publication US 2024/0202519 A1, Jun. 20, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G06F 16/583 (2019.01); G06F 40/284 (2020.01); G06N 3/04 (2023.01); G06N 3/084 (2023.01)

CPC G06N 3/08 (2013.01) [G06F 16/583 (2019.01); G06F 40/284 (2020.01); G06N 3/04 (2013.01); G06N 3/084 (2013.01)]

21 Claims

1. A method comprising:

obtaining a new document, wherein the new document includes a plurality of sequences of words, and, for each sequence of words, a word that is in another sequence of words in the new document and that follows a last word in the sequence of words in the new document;

generating a vector representation of the new document using a trained neural network system, wherein generating the vector representation of the new document using the trained neural network system comprises, for each iteration step of multiple iteration steps:

obtaining a current sequence of words from the plurality of sequences of words;

processing (i) data identifying the new document and (ii) the current sequence of words by the trained neural network system having an embedding layer and one or more other layers and in accordance with (i) trained values of a set of word parameters of the embedding layer and (ii) current values of a set of document parameters of the embedding layer to generate a respective word score for each word in a pre-determined set of words;

computing a gradient with respect to the vector representation of an error function that measures an error between the respective word scores and a target set of word scores that identifies a word that is in another sequence of words in the new document and that follows a last word in the current sequence of words in the new document; and

training the trained neural network system on the new document to adjust the current values of the set of document parameters of the embedding layer of the trained neural network system based on the gradient using gradient descent while holding the trained values of the set of word parameters of the embedding layer of the trained neural network system fixed; and

processing, by a text classification system, an input comprising the vector representation of the new document to generate a classification output for the new document.