| CPC G06F 16/906 (2019.01) [G06F 16/93 (2019.01); G06F 40/216 (2020.01); G06F 40/295 (2020.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01)] | 16 Claims |

|
1. A method comprising:
receiving, by one or more computing devices, a set of documents and metadata for each document in the set of documents, wherein the set of documents correspond to a domain;
generating, by the one or more computing devices, a set of word embeddings for each document of the set of documents, each word embedding including one or more words from a respective document;
tokenizing, by the one or more computing devices, each word embedding of the set of word embeddings into a set of segments, each segment including a word from the word embedding;
training, by the one or more computing devices, a learning model to classify each document of the set of documents of the domain by recursively, during each of a number of iterations of the training:
breaking down, by the one or more computing devices, each of the segments of the set of segments of each document of the set of documents into a set of features;
assigning, by the one or more computing devices, a part-of-speech tag to each of the segments of the set of segments for each document of the set of documents based on predetermined weights assigned to each feature of the set of features of a corresponding segment;
assigning, by the one or more computing devices, a dependency tag to each of the segments of the set of segments of each document of the set of documents based on the part-of-speech tag assigned to the corresponding segment and the predetermined weights assigned to each feature of the set of features of the corresponding segment;
assigning, by the one or more computing devices, a Named Entity Recognition (NER) label from a set of predefined labels corresponding to the domain to each of the segments of the set of segments of each document of the set of documents based on the part-of-speech tag and the dependency tag assigned to the corresponding segment and the predetermined weights assigned to each feature of the set of features of the corresponding segment; and
validating, by the one or more computing devices, the assigned NER labels by comparing the metadata for each document to the assigned NER labels of the respective document.
|