US 11,941,493 B2
Discovering and resolving training conflicts in machine learning systems
Michael Desmond, White Plains, NY (US); Matthew R. Arnold, Ridgefield Park, NJ (US); and Jeffrey S. Boston, Wappingers Falls, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Feb. 27, 2019, as Appl. No. 16/287,224.
Prior Publication US 2020/0272938 A1, Aug. 27, 2020
Int. Cl. G06N 20/00 (2019.01); G06N 3/08 (2023.01); G06N 7/01 (2023.01)
CPC G06N 20/00 (2019.01) [G06N 3/08 (2013.01); G06N 7/01 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
discovering, by a conflict detection system, a ground truth conflict between a first training data and a second training data fora machine learning system based on context of ground truth data, wherein
the discovering further comprises using ground truth clustering with cross validation and decision space clustering to train the machine learning system from all ground truth data, wherein
the first training data and the second training data are a same text query, and wherein the first training data and the second training data have different labels that describe the same text query, and wherein
the conflict is caused by the different labels describing one or more different ground truths that are accurate in different contexts, and wherein
context of the one or more different ground truths is based on information found in the first training data and the second training data as well as information found in similar ground truth examples, and wherein
the using of decision space clustering comprises training the machine learning system with all the ground truth data, generating confidence vectors for each ground truth example, clustering the confidence vectors based on density in an unsupervised manner, and comparing the clusters of the confidence vectors with original labeling to determine one or more conflicts;
in response to discovering the ground truth conflict between the first training data and the second training data for the machine learning system based on the different labels, applying conflict resolutions to the first training data and the second training data based on the context of the one or more different ground truths; and
training the machine learning system using the first training data and the second training data with adjusted context-based labels.