CPC G06F 16/2428 (2019.01) [G06F 16/217 (2019.01); G06F 16/221 (2019.01); G06F 16/2453 (2019.01)] | 19 Claims |
1. A method for generating a histogram, the method comprising the steps of:
(a) initialize a working histogram for a data column, the data column comprising a plurality of data values, wherein the working histogram comprises a plurality of rows, each row corresponding to a unique data value from the plurality of data values;
(b) capture one or more queries, wherein the one or more queries comprise one or more predicate literals;
(c) generate a weight vector based on the predicate literals, wherein the weight vector comprises a plurality of weight values, each weight value corresponding to one of the plurality of rows of the working histogram;
(d) calculate a cost value for each row of the working histogram, wherein each cost value is determined at least in part on an information loss and the weight value corresponding to one of the rows of the working histogram;
(e) identify a first row in the working histogram having a lowest cost value of the plurality of rows in the working histogram; and
(f) merge the first row of the working histogram with a second row of the working histogram,
wherein the information loss is calculated as an entropy value of a merged row in the working histogram less the sum of the entropy of a first row in the working histogram and a second row in the working histogram, wherein the merged row comprises the first row and the second row.
|