US 11,699,094 B2
Automatic feature selection and model generation for linear models
Paul Walter Hubenig, San Francisco, CA (US)
Assigned to Salesforce, Inc., San Francisco, CA (US)
Filed by Salesforce, Inc., San Francisco, CA (US)
Filed on Oct. 31, 2018, as Appl. No. 16/177,107.
Prior Publication US 2020/0134363 A1, Apr. 30, 2020
Int. Cl. G06N 3/08 (2023.01); G06F 15/18 (2006.01); G06N 20/00 (2019.01); G06F 17/17 (2006.01); G06N 7/00 (2023.01); G06F 18/2115 (2023.01); G06F 18/20 (2023.01)
CPC G06N 20/00 (2019.01) [G06F 17/17 (2013.01); G06F 18/2115 (2023.01); G06F 18/285 (2023.01); G06N 7/00 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for automated feature selection for linear model generation at an application server, comprising:
determining, for a set of data features related to a plurality of data records, a set of relevance measurements, wherein each relevance measurement of the set of relevance measurements corresponds to a respective feature of the set of data features;
selecting a subset of the set of data features based at least in part on the set of relevance measurements;
generating a matrix based at least in part on the selected subset of the set of data features, wherein generating the matrix comprises iteratively scanning the plurality of data records, and wherein the matrix enables computation of feature coefficients for the selected subset of the set of data features based at least in part on an increasing penalty value;
sorting the selected subset of the set of data features based at least in part on the increasing penalty value, wherein a first data feature of the subset of data features has a greater priority than a second data feature of the subset of data features based at least in part on the first data feature being set to zero later than the second data feature;
determining, according to the sorting, a plurality of nested linear models comprising a first nested linear model and a second nested linear model, wherein the first nested linear model comprises the first data feature and the second nested linear model comprises the first data feature and the second data feature; and
selecting a linear model of the plurality of nested linear models based at least in part on a model quality criterion comprising an Akaike information criterion (AIC) and the plurality of nested linear models.