US 12,346,784 B2
Intelligent scoring of missing data records
Jin Wang, Xi'an (CN); Si Er Han, Xi'an (CN); Lei Gao, Xian (CN); Jing James Xu, Xi'an (CN); A Peng Zhang, Xian (CN); and Jun Wang, Xi'an (CN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Sep. 16, 2020, as Appl. No. 17/022,734.
Prior Publication US 2022/0083918 A1, Mar. 17, 2022
Int. Cl. G06N 20/20 (2019.01); G06F 16/215 (2019.01); G06F 16/28 (2019.01)
CPC G06N 20/20 (2019.01) [G06F 16/215 (2019.01); G06F 16/285 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
monitoring, by one or more computer processors, a database server for training data with one or more missing values;
grouping, by one or more computer processors, a plurality of predictors contained in the training data into a plurality of predictor groups, wherein a number of predictors in the plurality of predictors associated with records with the one or more missing values is less than a square root of a total number of predictors;
creating, by one or more computer processors, a plurality of sample sets, wherein each sample set in the plurality of sample sets contains one or more predictors selected from a respective predictor group in the plurality of predictor groups;
training, by one or more computer processors, a cluster model for each created sample set in the plurality of created sample sets; and
generating, by one or more computer processors, a score for a record with one or more missing values utilizing at least one created cluster model of the created cluster models and at least one created sample set of the created sample sets, comprising:
responsive to a created top sample set, generating, by one or more computer processors, the score utilizing an ensemble score defined by a distance between a formed vector to each respective center of each cluster associated with each sample set in the top sample set.