US 11,941,497 B2
System and method of operationalizing automated feature engineering
James Max Kanter, Boston, MA (US); and Kalyan Kumar Veeramachaneni, Watertown, MA (US)
Assigned to Alteryx, Inc., Irvine, CA (US)
Filed by Alteryx, Inc., Irvine, CA (US)
Filed on Sep. 30, 2020, as Appl. No. 17/039,428.
Prior Publication US 2022/0101190 A1, Mar. 31, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 16/2457 (2019.01); G06F 16/28 (2019.01)
CPC G06N 20/00 (2019.01) [G06F 16/24578 (2019.01); G06F 16/285 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A computer implemented method, comprising:
receiving a dataset from a data source;
selecting primitives from a pool of primitives based on the received dataset, each of the selected primitives configured to be applied to at least a portion of the dataset to synthesize one or more features, wherein the selecting comprises:
generating a semantic representation of the received dataset, and
selecting primitives associated with attributes matching the semantic representation of the received dataset;
synthesizing a plurality of features by applying the selected primitives to the received dataset;
iteratively evaluating the plurality of features to remove some features from the plurality of features to obtain a subset of features, each iteration comprising:
evaluating usefulness of at least some features of the plurality of features by applying a different portion of the dataset to the evaluated features, and
removing some of the evaluated features based on the usefulness of the evaluated features to produce the subset of features;
determining an importance factor for each feature of the subset of features, wherein the determining comprises:
ranking the subset of features by inputting the subset of features and a first portion of the dataset into a machine learning model, and
determining the importance factors of the subset of features based on their ranking; and
generating the machine learning model based on the subset of features and the importance factor of each feature of the subset of features, the machine learning model configured to be used to make a prediction based on new data.