US 11,893,461 B2
System and method for labeling machine learning inputs
Sean Rowan, San Diego, CA (US); and Joseph Cessna, San Diego, CA (US)
Assigned to Intuit Inc., Mountain View, CA (US)
Filed by Intuit Inc., Mountain View, CA (US)
Filed on Jan. 27, 2022, as Appl. No. 17/586,100.
Application 17/586,100 is a continuation of application No. 16/142,393, filed on Sep. 26, 2018, granted, now 11,321,629.
Prior Publication US 2022/0147879 A1, May 12, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 20/00 (2019.01); G06Q 40/10 (2023.01)
CPC G06N 20/00 (2019.01) [G06Q 40/10 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method for generating labeled training set data for a machine learning process, the method performed by one or more processors of a machine learning-based labeling system and comprising:
retrieving, using a machine learning analysis model, labeled data indicating labels entered by a user for a plurality of data items, the analysis model trained to generate a prediction of a training data label that a given user will enter for an unlabeled training data item based on training data items that the given user has already labeled;
identifying, using the trained analysis model, one or more characteristics of the labeled data, each respective characteristic of the identified characteristics predictive of a label that the user will enter for an unlabeled data item having the respective characteristic;
generating, for each respective unlabeled data item of a set of unlabeled data items, using the trained analysis model, a prediction of a label that the user will enter for the respective unlabeled data item and a confidence score indicative of a likelihood that the predicted label is correct;
selecting, based on the confidence scores, a subset of the set of unlabeled data items to be presented for labeling;
receiving one or more labels entered for the selected subset of unlabeled data items;
determining, based on the one or more labels entered for the subset of unlabeled data items, that a completion criteria associated with the trained analysis model is met; and
generating, using the trained analysis model, a label for one or more remaining unlabeled data items of the set of unlabeled data items.