US 11,886,413 B1
Time-sliced approximate data structure for storing group statistics
Miguel Angel Casanova, Dublin (IE); and David Christopher Tracey, Monasterboice (IE)
Assigned to Rapid7, Inc., Boston, MA (US)
Filed by Rapid7, Inc., Boston, MA (US)
Filed on Jul. 22, 2020, as Appl. No. 16/936,013.
Int. Cl. G06F 16/22 (2019.01); G06F 16/2458 (2019.01); G06F 16/2455 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/2255 (2019.01) [G06F 16/2264 (2019.01); G06F 16/2477 (2019.01); G06F 16/24556 (2019.01); G06F 16/285 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
performing, by a computing system having one or more hardware processors and associated memory:
creating in memory a time-sliced approximate data structure (TSADS) that includes a counts matrix and a statistics matrix, wherein:
(a) the statistics matrix is used to store approximate statistics for different groups of timestamped datapoints in a plurality of time slices,
(b) the counts matrix implements a count-min sketch to store approximate counts of datapoints in the different groups in the time slices,
(c) the counts matrix is a three-dimensional matrix, wherein a first dimension corresponds to a set of hash functions used to hash the group key, a second dimension corresponds to respective hash buckets of the hash functions, and a third dimension corresponds to individual ones of the time slices, and
(d) the statistics matrix has the same dimensions as the counts matrix:
receiving a retrieve request to retrieve, from the TSADS, approximate statistics for a group of datapoints in the time slices, wherein the retrieve request specifies a group key of the group;
in response to the retrieve request:
for each of the time slices:
selecting a set of cells in the count-min sketch in the counts matrix based on the group key and the time slice, wherein each cell in the set stores an approximate count of datapoints in the group in the time slice;
determining a first cell from the set that stores a best approximate count; and
determining a best approximate statistic of the group in the time slice, wherein the best approximate statistic is retrieved from a second cell in the statistics matrix that corresponds to the first cell in the counts matrix; and
returning a time series of best approximate statistics of the group determined for each time slice.