CPC G06F 16/2365 (2019.01) [G06F 16/215 (2019.01); G06F 17/18 (2013.01)] | 20 Claims |
1. A computer system, comprising:
one or more processors; and
one or more non-transitory machine-readable medium coupled to the one or more processors and storing computer program code comprising sets of instructions for executable by the one or more processors to:
obtain an input dataset comprising one or more continuous features and one or more categorical features;
determine a number of categorical feature categories based on the one or more categorical features;
determine record counts for each of the categorical feature categories;
calculate skew statistics for each category based on the record counts for each of the categorical feature categories;
determine cardinality skew factors for each of the one or more categorical features based on the record counts for each of the categorical feature categories and the skew statistics for each category, wherein the cardinality-skew factor indicates the distribution of a categorical features categories;
select a number of the one or more categorical features having the highest cardinality skew factors from among the cardinality skew factors for each of the one or more categorical features; and
distribute a top contributor deviation analysis across multiple cloud computing instances;
perform the top contributor deviation analysis on the multiple cloud computing instances using the selected number of the one or more categorical features having the highest cardinality skew factors.
|