US 12,293,438 B2
Visualize data and significant records based on relationship with the model
Wen Pei Yu, Xian (CN); Xiao Ming Ma, Xi'an (CN); Xue Ying Zhang, Xi'an (CN); Si Er Han, Xi'an (CN); Jing James Xu, Xi'an (CN); Jing Xu, Xi'an (CN); and Jun Wang, Xi'an (CN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Dec. 13, 2022, as Appl. No. 18/064,959.
Prior Publication US 2024/0193830 A1, Jun. 13, 2024
Int. Cl. G06T 11/20 (2006.01); G06N 20/00 (2019.01)
CPC G06T 11/206 (2013.01) [G06N 20/00 (2019.01)] 16 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
presenting, by one or more processors, a first visualization of a training dataset in a first plot;
responsive to receiving a selection of a data group of the training dataset to analyze, identifying, by the one or more processors, three or fewer key model features of the data group of the training dataset;
ascertaining, by the one or more processors, a representative record of each key model feature of the three or fewer key model features using a Local Interpretable Model-Agnostic Explanation technique;
presenting, by the one or more processors, a second visualization of the three or fewer key model features and the representative record of each key model feature in a second plot;
correcting or completing, by the one or more processors, the training dataset, wherein the training dataset is either incorrect or incomplete;
prior to presenting the first visualization of the training dataset in the first plot, gathering, by one or more processors, the training dataset from one or more sources;
determining, by one or more processors, a degree of importance of the one or more key model features;
ranking, by one or more processors, the one or more key model features according to the degree of importance; and
selecting, by one or more processors, the three or fewer key model features based on a set of criteria, wherein the set of criteria is selected from a group consisting of: a degree of accuracy of each key model feature of the training dataset and a pre-set configuration, further comprises:
selecting, by one or more processors, two key model features of the three or fewer key model features selected; and
condensing, by one or more processors, a key model feature of the three or fewer key model features not selected into a linear combination using a Principle Component Analysis to produce a three-dimension condensed data.