CPC G06V 30/413 (2022.01) [G06F 16/3347 (2019.01); G06F 18/2155 (2023.01); G06F 18/22 (2023.01); G06F 40/30 (2020.01); G06N 3/0455 (2023.01); G06N 3/088 (2013.01); G06V 30/274 (2022.01); G06V 30/416 (2022.01); G06F 16/2462 (2019.01); G06F 16/35 (2019.01); G06N 3/045 (2023.01)] | 19 Claims |
1. A method of operating a system for classifying a document, the method comprising:
obtaining a plurality of word embeddings from a plurality of words constituting a plurality of sentences included in the document;
providing, to a semantic analysis model, the plurality of word embeddings, wherein the semantic analysis model generates, based on the plurality of word embeddings, a plurality of document features representing the document, the plurality of document features including a keyword similarity and a sentence similarity;
extracting, from the semantic analysis model, the plurality of document features;
providing, to an inference model, the plurality of word embeddings and the plurality of document features, wherein the inference model evaluates the document based on the plurality of word embeddings and the plurality of document features and generates an evaluation result of the document;
extracting, from the inference model, the evaluation result; and
outputting the evaluation result,
wherein, for generating the evaluation result of the document, the inference model performs:
generating, using a hierarchical attention network (HAN), a document vector from the plurality of word embeddings;
concatenating the keyword similarity and sentence similarity with the document vector to generate a concatenated vector; and
generating, using a fully connected layer, the evaluation result of the document based on the concatenated vector.
|