CPC G06F 40/289 (2020.01) [G06F 40/30 (2020.01); G06F 40/40 (2020.01)] | 20 Claims |
12. A method for topic modeling, comprising:
encoding words of a document using an embedding matrix to obtain word embeddings for the document, wherein the words of the document comprise a subset of words in a vocabulary;
generating a sequence of hidden representations corresponding to the word embeddings using a sequential encoder, wherein the sequence of hidden representations comprises an order based on an order of the words in the document;
computing a context vector for the document based on the sequence of hidden representations;
generating a latent vector based on the context vector using an auto-encoder;
computing a loss function based on the latent vector;
updating parameters of the embedding matrix and a topic attention network based on the loss function; and
predicting, using the embedding matrix and the topic attention network, a set of words including a topic for an input document based on a context vector for the input document.
|