US 11,914,622 B2
Efficient factor analysis on large datasets using categorical variables
Nikita Iserson, Sankt-Leon Rot (DE); Sawinder Kaur, Bangalore (IN); Yogeshwaran Kandasamy, Puducherry (IN); Balaji Elumalai, Bangalore (IN); Ashish Tripathy, Bangalore (IN); and Karl-Peter Nos, Nussloch (DE)
Assigned to SAP SE, Walldorf (DE)
Filed by SAP SE, Walldorf (DE)
Filed on May 27, 2020, as Appl. No. 16/884,454.
Claims priority of application No. 202011015547 (IN), filed on Apr. 9, 2020.
Prior Publication US 2021/0319045 A1, Oct. 14, 2021
Int. Cl. G06F 16/00 (2019.01); G06F 16/22 (2019.01); G06F 16/2455 (2019.01); G06F 16/28 (2019.01); G06N 20/00 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 16/221 (2019.01); G06F 16/2456 (2019.01); G06N 20/00 (2019.01)] 19 Claims
OG exemplary drawing
 
1. In a computer system that implements a factor analysis tool on one or more hardware processors with memory coupled thereto, a method comprising:
receiving a request from a client, the request specifying a plurality of root factors, and responsive to the request:
performing stratified sampling on a population of records to generate a sample of records by:
combining the root factors into a joint factor;
determining respective values of the joint factor for each record of the population of records;
wherein each of the values is shared by a respective subgroup of the population of records, the subgroup having a cardinality; and
for each of the subgroups, selecting a number of members of the sample of records from the subgroup, the number proportional to the cardinality of the subgroup;
analyzing the sample of records;
identifying, based on the analyzing, a plurality of key factors for a target metric, wherein the records of the population of records store values for each of the key factors from a set of values (“factor values”) for that key factor;
making evaluations of the key factors for the population of records and determining a plurality of scores by, for each of the key factors and at least one respective factor value:
determining a corresponding score, of the plurality of scores, over all of the records of the population, for that factor value;
ranking the plurality of scores; and
based on the ranking:
determining one or more of the factor values having highest-ranked scores according to a predetermined criterion; and
transmitting the one or more determined factor values to the client.