| CPC G06N 20/00 (2019.01) [G06F 18/211 (2023.01); G06F 18/2155 (2023.01)] | 17 Claims |

|
1. A computer-implemented method, executing on a computing device, comprising:
receiving a dataset with one or more existing features, wherein the received dataset is associated with a dataset label defining a machine-learning target attribute;
receiving an operation that can be applied to at least one of the one or more existing features of the dataset;
generating one or more potential features by applying the operation to the at least one of the one or more existing features;
for each of the one or more potential features:
applying a feature importance algorithm to the respective feature along with the one or more existing features, and
generating a respective feature importance value for the respective feature based on applying the feature importance algorithm;
generating respective feature importance values for each of the one or more existing features based on applying the feature importance algorithm; and
updating, based on applying the feature importance algorithm, the respective feature importance value for the respective feature by:
summing the respective feature importance values for the one or more existing features, and
dividing by the respective feature importance value:
sorting the generated one or more potential features by their respective feature importance values;
receiving a threshold level of correlation;
iterating through the sorted, generated one or more potential features until a total number of new features are added or until no more potential features are left by, for a given potential feature:
determining one or more correlations between the given potential feature and each of the one or more existing features; and
adding, based on determining that each of the one or more correlations is under the threshold level of correlation, the given potential feature to the dataset as a new feature; and
training, using the dataset, a machine learning model.
|