US 11,853,908 B2
Data-analysis-based, noisy labeled and unlabeled datapoint detection and rectification for machine-learning
Shaikh Shahriar Quader, Scarborough (CA); Mona Nashaat Ali Elmowafy, Edmonton (CA); and Darrell Christopher Reimer, Tarrytown, NY (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on May 13, 2020, as Appl. No. 15/930,900.
Prior Publication US 2021/0357776 A1, Nov. 18, 2021
Int. Cl. G06N 5/04 (2023.01); G06N 20/20 (2019.01); G06N 7/01 (2023.01)
CPC G06N 5/04 (2013.01) [G06N 20/20 (2019.01); G06N 7/01 (2023.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
training a machine-learning model using a training dataset including both noisy labeled datapoints and unlabeled datapoints, the training comprising:
detecting, by one or more processors, noise in the training dataset, the detecting applying ensemble machine-learning and a generative model to the training dataset to detect noisy labeled datapoints in the training dataset, and to also create a clean dataset with preliminary labels added for the unlabeled datapoints in the training dataset;
generating a datapoint pool including the detected noisy labeled datapoints and the unlabeled datapoints of the training dataset; and
data-driven rectifying of one or more selected datapoints of the generated datapoint pool including the detected noisy labeled datapoints and the unlabeled datapoints of the training dataset, the data-driven rectifying including using, by the one or more processors, meta-data-driven active learning and the clean dataset to facilitate generating an active-learned dataset with true labels added for the one or more selected datapoints of the datapoint pool including the detected noisy labeled datapoints and the unlabeled datapoints of the training dataset; and
wherein training the machine-learning model further includes using, at least in part, the generated active-learned dataset in training the machine-learning model.