US 12,482,242 B1
Data centric mislabel detection
Yu Qing Zhou, Stanford, CA (US); Dillon Laird, Santa Monica, CA (US); Yuxiang Zhang, Shanghai (CN); Andrew Yan-Tak Ng, Camas, WA (US); Daniel Bibireata, Bellevue, WA (US); Kai Yang, Fremont, CA (US); Shankaranand Jagadeesan, San Jose, CA (US); and Mark William Sabini, River Edge, NJ (US)
Assigned to LandingAI Inc., Palo Alto, CA (US)
Filed by LandingAI Inc., Palo Alto, CA (US)
Filed on Jul. 28, 2023, as Appl. No. 18/227,800.
Claims priority of provisional application 63/393,699, filed on Jul. 29, 2022.
Int. Cl. G06V 10/776 (2022.01); G06V 10/771 (2022.01); G06V 10/774 (2022.01)
CPC G06V 10/776 (2022.01) [G06V 10/771 (2022.01); G06V 10/774 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for identifying mislabels in a labeled dataset, the method comprising:
accessing a training dataset comprising a plurality of labeled samples, each of the plurality of labeled samples labeled with a ground-truth label;
dividing the plurality of labeled samples into a plurality of training subsets and hold-out test subsets;
for each of a training subset and a corresponding hold-out test subset in the plurality of training subsets and hold-out test subsets,
training a machine learning model using a corresponding training subset; and
applying the trained machine learning model to a corresponding hold-out test subset to generate prediction labels for samples in the corresponding hold-out test subset, wherein each prediction label has a confidence score indicating a likelihood that the prediction label is correct;
pairing prediction labels and ground truth labels corresponding to same samples;
comparing a pair of prediction label and ground truth label corresponding to a same sample to determine whether there is a candidate mislabel;
determining whether the candidate mislabel is a mislabel based in part on a confidence score of the prediction label; and
generating for display the determined mislabel.