US 11,914,629 B2
	Practical supervised classification of data sets
Arunav Mishra, Ludwigshafen (DE); Henning Schwabe, Ludwigshafen (DE); and Lalita Shaki Uribe Ordonez, Ludwigshafen (DE)
Assigned to BASF SE, Ludwigshafen (DE)
Filed by BASF SE, Ludwigshafen (DE)
Filed on Aug. 5, 2021, as Appl. No. 17/394,994.
Claims priority of application No. 20190061 (EP), filed on Aug. 7, 2020.
Prior Publication US 2022/0043850 A1, Feb. 10, 2022
Int. Cl. G06F 16/35 (2019.01); G06F 16/338 (2019.01); G06N 20/00 (2019.01)

CPC G06F 16/35 (2019.01) [G06F 16/338 (2019.01); G06N 20/00 (2019.01)]

9 Claims

1. A computer-implemented method for training a classifier model for data classification, in particular in response to a search query, comprising:

a) obtaining a dataset that comprises a seed set of labeled data representing a training dataset;

b) training the classifier model by using the training dataset to fit parameters of the classifier model;

c) evaluating a quality of the classifier model using a test dataset that comprises unlabeled data from the obtained dataset to generate a classifier confidence score indicative of a probability of correctness of the classifier model working on the test dataset;

d) determining a global risk value of misclassification and a reward value based on the classifier confidence score on the test dataset;

e) iteratively updating the parameters of the classifier model and performing steps b) to d) until the global risk value falls within a predetermined risk limit value or an expected reward value is reached to obtain a trained classifier model for data classification,

wherein step d) further comprises:

d1) generating a classifier confidence score indicative of a probability of correctness of the classifier model working on the test dataset;

d2) computing a classifier metric at different thresholds on classifier confidence score, the classifier metrics representing a measure of a test's accuracy;

d3) determining a reference threshold that corresponds to a peak in a distribution of the classifier metric over the threshold on classifier confidence score;

d4) determining a threshold range that defines a recommended window according to a predefined criteria, wherein the reference threshold is located within the threshold range; and

d5) computing the reward value at different thresholds on classifier confidence score, and

wherein the reward value includes at least one of a measure of information gain and a measure of decrease in uncertainty.