US 11,742,081 B2
Data model processing in machine learning employing feature selection using sub-population analysis
Uri Kartoun, Cambridge, MA (US); Kristen Severson, Somerville, MA (US); Kenney Ng, Arlington, MA (US); Paul D. Myers, Bloomfield Hills, MI (US); Wangzhi Dai, Cambridge, MA (US); and Collin M. Stultz, Newton, MA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US); and Massachusetts Institute of Technology, Cambridge, MA (US)
Filed by International Business Machines Corporation, Armonk, NY (US); and Massachusetts Institute of Technology, Cambridge, MA (US)
Filed on Apr. 30, 2020, as Appl. No. 16/863,452.
Prior Publication US 2021/0343421 A1, Nov. 4, 2021
Int. Cl. G16H 40/67 (2018.01); G16H 50/50 (2018.01); G06F 17/18 (2006.01); G16H 50/70 (2018.01); G06F 18/23 (2023.01)
CPC G16H 40/67 (2018.01) [G06F 17/18 (2013.01); G06F 18/23 (2023.01); G16H 50/50 (2018.01); G16H 50/70 (2018.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to select features of a dataset for predictive modeling, the computer-implemented method comprising:
selecting, from a dataset comprising a plurality of cases and controls, a first set of features that are relevant to outcome;
identifying a subset of cases and controls having similar values for the first set of features;
analyzing the subset of cases and controls to select a second set of additional features that are relevant to outcome;
evaluating performance of a first predictive model against a second predictive model to determine that the second predictive model more accurately predicts outcome, wherein the first predictive model is based on the first set of features and the second predictive model is based on the first set of features and the second set of additional features; and
utilizing the second predictive model to predict outcomes.