CPC G06F 16/2365 (2019.01) [G06F 16/535 (2019.01); G06F 16/5866 (2019.01)] | 23 Claims |
16. A computing system, comprising:
memory configured to store a set of monotonic distinct count sketches associated with a plurality of distributed data sets, the plurality of distributed data sets having corresponding labels and corresponding pluralities of distinct items; and
one or more processors operatively coupled to the memory, the one or more processors being configured to:
initialize the set of sketches, each sketch in the set being associated with an accuracy parameter, the accuracy parameter indicating an approximation accuracy for that sketch; and
perform, for each of the plurality of distributed data sets, a query to determine whether a given label associated with that distributed data set is in a corresponding sketch of the set of sketches, wherein:
when the given label is in the corresponding sketch, then insert the distinct item associated with the given label into the corresponding sketch; and
when the given label is not in the corresponding sketch, then:
when a number of labels in the corresponding sketch is less than a specified size, add the given label along with a new sketch, and insert the distinct item associated with the given label into the new sketch; and
when the number of labels in the corresponding sketch is greater than or equal to the specified size, add the given label and assign it to a selected one of the set of sketches associated with a minimum label size, and insert the distinct item associated with the given label into the selected sketch.
|