US 12,437,237 B2
Sequential synthesis and selection for feature engineering
Michael Langford, Plano, TX (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Jul. 7, 2022, as Appl. No. 17/859,978.
Prior Publication US 2024/0013089 A1, Jan. 11, 2024
Int. Cl. G06F 18/211 (2023.01); G06F 18/214 (2023.01); G06N 20/00 (2019.01)
CPC G06N 20/00 (2019.01) [G06F 18/211 (2023.01); G06F 18/2155 (2023.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method, executing on a computing device, comprising:
receiving a dataset with one or more existing features, wherein the received dataset is associated with a dataset label defining a machine-learning target attribute;
receiving an operation that can be applied to at least one of the one or more existing features of the dataset;
generating one or more potential features by applying the operation to the at least one of the one or more existing features;
for each of the one or more potential features:
applying a feature importance algorithm to the respective feature along with the one or more existing features, and
generating a respective feature importance value for the respective feature based on applying the feature importance algorithm;
generating respective feature importance values for each of the one or more existing features based on applying the feature importance algorithm; and
updating, based on applying the feature importance algorithm, the respective feature importance value for the respective feature by:
summing the respective feature importance values for the one or more existing features, and
dividing by the respective feature importance value:
sorting the generated one or more potential features by their respective feature importance values;
receiving a threshold level of correlation;
iterating through the sorted, generated one or more potential features until a total number of new features are added or until no more potential features are left by, for a given potential feature:
determining one or more correlations between the given potential feature and each of the one or more existing features; and
adding, based on determining that each of the one or more correlations is under the threshold level of correlation, the given potential feature to the dataset as a new feature; and
training, using the dataset, a machine learning model.