US 11,868,436 B1
Artificial intelligence system for efficient interactive training of machine learning models
Sedat Gokalp, Issaquah, WA (US); and Abhishek Dan, Sammamish, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 14, 2018, as Appl. No. 16/008,894.
Int. Cl. G06F 16/28 (2019.01); G06F 18/214 (2023.01); G06N 7/00 (2023.01); G06N 20/00 (2019.01); G06F 18/241 (2023.01); G06F 18/21 (2023.01)
CPC G06F 18/2148 (2023.01) [G06F 18/2178 (2023.01); G06F 18/241 (2023.01); G06N 7/00 (2013.01); G06N 20/00 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A system, comprising:
one or more computing devices of an artificial intelligence-based classification service;
wherein the one or more computing devices are configured to:
perform one or more training iterations until a training completion criterion is met, wherein a particular training iteration comprises at least:
obtaining, via an interactive programmatic interface, respective class labels from a label provider for at least some data items of a particular set of data items identified as candidates for labeling feedback in a previous training iteration, wherein at least some class labels of the respective class labels are obtained from the label provider asynchronously with respect to (a) a start of the particular training iteration and (b) an end of the previous training iteration, wherein a class label of a data item represents a class into which the data item is to be classified;
generating, using one or more classifiers, classification predictions corresponding to a test set, wherein an individual classifier of the one or more classifiers is trained using a training set that includes at least some of the class labels obtained using the interactive programmatic interface; and
identifying, based at least in part on (a) the classification predictions and (b) an active learning algorithm, another set of data items as candidates for labeling feedback to request class labels from the label provider with respect to a next training iteration; and
wherein a different training iteration prior to the particular iteration comprises:
use a first machine learning model to identify at least a first attribute value of one or more data items, such that a correlation between presence of the first attribute value and a variation in classification predictions of the one or more data items exceeds a threshold;
identify, using the first attribute, at least one data item as a candidate for labeling feedback in the particular training iteration;
provide, after the training completion criterion has been met, a respective classification prediction obtained from a particular classifier with respect to one or more data items, wherein the particular classifier was trained using a particular training set, wherein class labels for at least some items of the particular training set were obtained in the one or more training iterations.