CPC G06F 16/2264 (2019.01) [G06F 12/0875 (2013.01); G06F 16/13 (2019.01); G06F 16/148 (2019.01); G06F 16/172 (2019.01); G06F 16/182 (2019.01); G06F 16/283 (2019.01); G06F 2212/601 (2013.01)] | 20 Claims |
1. A system for generating a multidimensional data cube, comprising:
a computer comprising one or more microprocessors;
a data processing cluster executing on the one or more microprocessors and operable to:
receive, from one or more data sources, a source data comprising a plurality of columns of data;
combine each of a plurality of categorical columns within the source data with each of a plurality of numerical columns within the source data, to generate a plurality of data column combinations from the source data;
generate a plurality of key-value pairs corresponding to the plurality of data column combinations and row values in the source data;
collect values paired with a same key to determine one or more aggregate numerical values or frequency values within the source data;
generate a plurality of output files, including for each of the plurality of data column combinations generated for the source data, an output file that stores a pre-computed result of a query on the source data represented by the aggregate numerical values or the frequency values;
store the plurality of output files into a data cube, wherein the data cube stores the pre-computed results for the possible queries on the plurality of columns of the source data;
generate a mapping string for each of the plurality of output files in the data cube and indicative of a column of the source data; and
upon receiving another query from a client application, utilizing a generated mapping string to map the received the another query to one of the plurality of output files in order to provide, in response to the another query, a pre-computed result stored at the one of the plurality of output files.
|