| CPC G06V 10/776 (2022.01) [G06V 10/771 (2022.01); G06V 10/774 (2022.01)] | 20 Claims |

|
1. A computer-implemented method for identifying mislabels in a labeled dataset, the method comprising:
accessing a training dataset comprising a plurality of labeled samples, each of the plurality of labeled samples labeled with a ground-truth label;
dividing the plurality of labeled samples into a plurality of training subsets and hold-out test subsets;
for each of a training subset and a corresponding hold-out test subset in the plurality of training subsets and hold-out test subsets,
training a machine learning model using a corresponding training subset; and
applying the trained machine learning model to a corresponding hold-out test subset to generate prediction labels for samples in the corresponding hold-out test subset, wherein each prediction label has a confidence score indicating a likelihood that the prediction label is correct;
pairing prediction labels and ground truth labels corresponding to same samples;
comparing a pair of prediction label and ground truth label corresponding to a same sample to determine whether there is a candidate mislabel;
determining whether the candidate mislabel is a mislabel based in part on a confidence score of the prediction label; and
generating for display the determined mislabel.
|