US 12,111,819 B1
Sampling space-saving set sketches
Homin K. Lee, Brooklyn, NY (US); and Charles-Philippe Masson, Paris (FR)
Assigned to DataDog Inc., New York City, NY (US)
Filed by DataDog Inc., New York City, NY (US)
Filed on Aug. 30, 2023, as Appl. No. 18/239,810.
Int. Cl. G06F 16/00 (2019.01); G06F 16/23 (2019.01); G06F 16/535 (2019.01); G06F 16/58 (2019.01)
CPC G06F 16/2365 (2019.01) [G06F 16/535 (2019.01); G06F 16/5866 (2019.01)] 23 Claims
OG exemplary drawing
 
16. A computing system, comprising:
memory configured to store a set of monotonic distinct count sketches associated with a plurality of distributed data sets, the plurality of distributed data sets having corresponding labels and corresponding pluralities of distinct items; and
one or more processors operatively coupled to the memory, the one or more processors being configured to:
initialize the set of sketches, each sketch in the set being associated with an accuracy parameter, the accuracy parameter indicating an approximation accuracy for that sketch; and
perform, for each of the plurality of distributed data sets, a query to determine whether a given label associated with that distributed data set is in a corresponding sketch of the set of sketches, wherein:
when the given label is in the corresponding sketch, then insert the distinct item associated with the given label into the corresponding sketch; and
when the given label is not in the corresponding sketch, then:
when a number of labels in the corresponding sketch is less than a specified size, add the given label along with a new sketch, and insert the distinct item associated with the given label into the new sketch; and
when the number of labels in the corresponding sketch is greater than or equal to the specified size, add the given label and assign it to a selected one of the set of sketches associated with a minimum label size, and insert the distinct item associated with the given label into the selected sketch.