CPC G06N 5/04 (2013.01) [G06N 20/20 (2019.01); G06N 7/01 (2023.01)] | 18 Claims |
1. A computer-implemented method comprising:
training a machine-learning model using a training dataset including both noisy labeled datapoints and unlabeled datapoints, the training comprising:
detecting, by one or more processors, noise in the training dataset, the detecting applying ensemble machine-learning and a generative model to the training dataset to detect noisy labeled datapoints in the training dataset, and to also create a clean dataset with preliminary labels added for the unlabeled datapoints in the training dataset;
generating a datapoint pool including the detected noisy labeled datapoints and the unlabeled datapoints of the training dataset; and
data-driven rectifying of one or more selected datapoints of the generated datapoint pool including the detected noisy labeled datapoints and the unlabeled datapoints of the training dataset, the data-driven rectifying including using, by the one or more processors, meta-data-driven active learning and the clean dataset to facilitate generating an active-learned dataset with true labels added for the one or more selected datapoints of the datapoint pool including the detected noisy labeled datapoints and the unlabeled datapoints of the training dataset; and
wherein training the machine-learning model further includes using, at least in part, the generated active-learned dataset in training the machine-learning model.
|