US 12,189,719 B1
Automatic benchmarking of labeling tasks
Nathaniel John Herman, San Francisco, CA (US); Akshat Bubna, San Francisco, CA (US); Alexandr Wang, San Francisco, CA (US); Shariq Shahab Hashme, San Francisco, CA (US); Samuel J. Clearman, San Francisco, CA (US); Liren Tu, Danville, CA (US); Jeffrey Zhihong Li, San Francisco, CA (US); and James Lennon, San Francisco, CA (US)
Assigned to Scale AI, Inc., San Francisco, CA (US)
Filed by Scale AI, Inc., San Francisco, CA (US)
Filed on Jan. 6, 2022, as Appl. No. 17/570,063.
Application 17/570,063 is a continuation of application No. 16/730,840, filed on Dec. 30, 2019, granted, now 11,308,364.
Int. Cl. G06F 18/21 (2023.01); G06F 18/2115 (2023.01); G06F 18/214 (2023.01); G06F 18/2433 (2023.01); G06N 20/20 (2019.01)
CPC G06F 18/2178 (2023.01) [G06F 18/2115 (2023.01); G06F 18/2148 (2023.01); G06F 18/2433 (2023.01); G06N 20/20 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for evaluating labeled data, comprising:
selecting, based on a distribution of label values included in a set of labels for a data sample, a subset of at least two different labels, wherein each label of the subset is selected based on a corresponding label value that is a non-outlier in the distribution of label values, and wherein the subset of the at least two different labels is further selected based on high consensus among a first group of users with labeling performance that exceeds a threshold and low consensus among a second group of users with labeling performance that falls below the threshold;
receiving an additional label for the data sample;
determining a benchmark for the data sample based on the subset of the at least two different labels;
generating, based on one or more comparisons of the additional label with the benchmark, a benchmark score for the additional label; and
generating a set of performance metrics for labeling the data sample based on the benchmark score.