| CPC G06N 3/08 (2013.01) [G06F 16/583 (2019.01); G06F 40/284 (2020.01); G06N 3/04 (2013.01); G06N 3/084 (2013.01)] | 21 Claims |

|
1. A method comprising:
obtaining a new document, wherein the new document includes a plurality of sequences of words, and, for each sequence of words, a word that is in another sequence of words in the new document and that follows a last word in the sequence of words in the new document;
generating a vector representation of the new document using a trained neural network system, wherein generating the vector representation of the new document using the trained neural network system comprises, for each iteration step of multiple iteration steps:
obtaining a current sequence of words from the plurality of sequences of words;
processing (i) data identifying the new document and (ii) the current sequence of words by the trained neural network system having an embedding layer and one or more other layers and in accordance with (i) trained values of a set of word parameters of the embedding layer and (ii) current values of a set of document parameters of the embedding layer to generate a respective word score for each word in a pre-determined set of words;
computing a gradient with respect to the vector representation of an error function that measures an error between the respective word scores and a target set of word scores that identifies a word that is in another sequence of words in the new document and that follows a last word in the current sequence of words in the new document; and
training the trained neural network system on the new document to adjust the current values of the set of document parameters of the embedding layer of the trained neural network system based on the gradient using gradient descent while holding the trained values of the set of word parameters of the embedding layer of the trained neural network system fixed; and
processing, by a text classification system, an input comprising the vector representation of the new document to generate a classification output for the new document.
|