US 12,079,648 B2
	Framework of proactive and/or reactive strategies for improving labeling consistency and efficiency
Evelyn Duesterwald, Millwood, NY (US); Austin Zachary Henley, Memphis, TN (US); David John Piorkowski, White Plains, NY (US); and John T. Richards, Honeoye Falls, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Dec. 28, 2017, as Appl. No. 15/857,599.
Prior Publication US 2019/0205703 A1, Jul. 4, 2019
Int. Cl. G06F 9/451 (2018.01); G06F 17/18 (2006.01); G06F 18/24 (2023.01); G06F 18/40 (2023.01); G06F 40/169 (2020.01); G06N 20/00 (2019.01); G06F 16/28 (2019.01)

CPC G06F 9/453 (2018.02) [G06F 17/18 (2013.01); G06F 18/24 (2023.01); G06F 18/41 (2023.01); G06F 40/169 (2020.01); G06N 20/00 (2019.01); G06F 16/285 (2019.01)]

13 Claims

1. A method for improving performance of a computer implementing a machine learning system, said method comprising:

providing, via a graphical user interface, to an annotator, unlabeled corpus data;

obtaining, via said graphical user interface, labels for said unlabeled corpus data;

detecting, with a consistency calculation routine, concurrent with said obtaining of said labels, at least internal inconsistency in said labels based on a comparison of an inconsistency measurement in relation to a given threshold, said detecting including periodically retesting said annotator on a portion of data previously-labeled by said annotator, the periodic retesting comprising re-presenting in unlabeled form via said graphical user interface said portion of data that was previously-labeled by said annotator and, in response, receiving a new label from the annotator, the inconsistency measurement being based on a determination of whether said new label is consistent with an initial label provided previously by the annotator to respond to an initial presentation of said portion of data;

responsive to said detection of said internal inconsistency, intervening in said obtaining of said labels, concurrent with said obtaining with said labels, with a reactive intervention subsystem until said internal inconsistency in said labels is addressed;

completing said obtaining of said labels subsequent to said intervening;

carrying out training of said machine learning system to provide a trained machine learning system, based on results of said completing of said obtaining of said labels subsequent to said intervening; and

carrying out classifying new data with said trained machine learning system.