CPC G06F 3/0655 (2013.01) [G06F 3/0604 (2013.01); G06F 3/067 (2013.01)] | 20 Claims |
1. A method comprising:
applying, by a computing device, a first membership test to a first group of candidate block identifiers corresponding to a first data set in a distributed storage system to generate a first efficiency set;
applying, by the computing device, a second membership test to a second group of candidate block identifiers corresponding to a second data set in the distributed storage system to generate a second efficiency set;
wherein at least one of the first membership test or the second membership test includes a filter that specifies that a candidate block identifier is added to the first efficiency set or the second efficiency set, respectively, when the candidate block identifier matches a threshold number of bits of a filter sequence of bits;
determining, by the computing device, a set difference based on a comparison of the first efficiency set and the second efficiency set; and
estimating, by the computing device, an amount of unique data within the second data set based on the set difference.
|