CPC G06F 40/20 (2020.01) [G06F 18/2148 (2023.01); G06F 18/2155 (2023.01); G06N 20/20 (2019.01)] | 20 Claims |
1. A system for improving natural language processing, comprising:
one or more processors; and
memory in communication with the one or more processors and storing instructions that are configured to cause the system to:
receive a corpus of labeled text documents;
train a first machine learning model based on the labeled text documents;
train a second machine learning model based on a first subset of the labeled text documents;
train a third machine learning model based on a second subset of the labeled text documents;
receive a set of text documents;
generate, using the first machine learning model, a first classification of each text document in the set of text documents as positive or negative to define: (i) a first subset of text documents that comprise a first portion of the set of text documents that have a first classification that is positive and (ii) a second subset of text documents that comprise a second portion of the set of text documents that have a first classification that is negative;
generate, using the second machine learning model, a second classification of each text document in the first subset of text documents as positive or negative;
generate, using the third machine learning model, a third classification of each text document in the second subset of text documents as positive or negative; and
modify the first classification of one or more text documents of the set of text documents, based on the second classification and the third classification, to create a fourth classification by:
when the second classification of one or more text documents in the first subset of text documents is negative, replacing the first classification of the one or more text documents in the first subset of text documents with the second classification to change a positive classification associated with the one or more text documents in the first subset of text documents to be a negative classification; and
when the third classification of one or more text documents in the second subset of text documents is positive, replacing the first classification of the one or more text documents in the second subset of text documents with the third classification to change a negative classification associated with the one or more text documents in the second subset of text documents to be a positive classification;
wherein the first machine learning model comprises naïve bayes model, and
wherein the second machine learning model comprises a term frequency-inverse document frequency (TF-IDF) gradient boosting machine (GBM) model.
|