CPC G06F 16/35 (2019.01) [G06F 16/338 (2019.01); G06N 20/00 (2019.01)] | 9 Claims |
1. A computer-implemented method for training a classifier model for data classification, in particular in response to a search query, comprising:
a) obtaining a dataset that comprises a seed set of labeled data representing a training dataset;
b) training the classifier model by using the training dataset to fit parameters of the classifier model;
c) evaluating a quality of the classifier model using a test dataset that comprises unlabeled data from the obtained dataset to generate a classifier confidence score indicative of a probability of correctness of the classifier model working on the test dataset;
d) determining a global risk value of misclassification and a reward value based on the classifier confidence score on the test dataset;
e) iteratively updating the parameters of the classifier model and performing steps b) to d) until the global risk value falls within a predetermined risk limit value or an expected reward value is reached to obtain a trained classifier model for data classification,
wherein step d) further comprises:
d1) generating a classifier confidence score indicative of a probability of correctness of the classifier model working on the test dataset;
d2) computing a classifier metric at different thresholds on classifier confidence score, the classifier metrics representing a measure of a test's accuracy;
d3) determining a reference threshold that corresponds to a peak in a distribution of the classifier metric over the threshold on classifier confidence score;
d4) determining a threshold range that defines a recommended window according to a predefined criteria, wherein the reference threshold is located within the threshold range; and
d5) computing the reward value at different thresholds on classifier confidence score, and
wherein the reward value includes at least one of a measure of information gain and a measure of decrease in uncertainty.
|