US 12,189,981 B2
Efficiency sets for determination of unique data
Alyssa Proulx, Longmont, CO (US); and Mark David Olson, Longmont, CO (US)
Assigned to NETAPP, INC., San Jose, CA (US)
Filed by NetApp, Inc., San Jose, CA (US)
Filed on Jan. 30, 2023, as Appl. No. 18/161,391.
Application 18/161,391 is a continuation of application No. 17/457,117, filed on Dec. 1, 2021, granted, now 11,567,694.
Application 17/457,117 is a continuation of application No. 16/940,461, filed on Jul. 28, 2020, granted, now 11,194,506, issued on Dec. 7, 2021.
Prior Publication US 2023/0176773 A1, Jun. 8, 2023
Int. Cl. G06F 3/06 (2006.01)
CPC G06F 3/0655 (2013.01) [G06F 3/0604 (2013.01); G06F 3/067 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
applying, by a computing device, a first membership test to a first group of candidate block identifiers corresponding to a first data set in a distributed storage system to generate a first efficiency set;
applying, by the computing device, a second membership test to a second group of candidate block identifiers corresponding to a second data set in the distributed storage system to generate a second efficiency set;
wherein at least one of the first membership test or the second membership test includes a filter that specifies that a candidate block identifier is added to the first efficiency set or the second efficiency set, respectively, when the candidate block identifier matches a threshold number of bits of a filter sequence of bits;
determining, by the computing device, a set difference based on a comparison of the first efficiency set and the second efficiency set; and
estimating, by the computing device, an amount of unique data within the second data set based on the set difference.