US 12,411,810 B1
	Efficient removal of stale entries from on-drive deduplication hash tables using hash prefix indexing
Amit Zaitman, Shavey Shomron (IL); Uri Shabi, Tel Mond (IL); and Alexander Shknevsky, Fair Lawn, NJ (US)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Aug. 16, 2024, as Appl. No. 18/806,834.
Int. Cl. G06F 16/174 (2019.01); G06F 16/14 (2019.01)

CPC G06F 16/1748 (2019.01) [G06F 16/152 (2019.01)]

20 Claims

1. A method comprising:

providing or making accessible an on-drive deduplication (“dedupe”) index, the on-drive dedupe index including a plurality of index entries, each index entry including a digest of a data page and an address associated with a location where the data page is stored, each digest having a digest prefix, each data page being associated with a reference count, each index entry being assigned to a bucket data structure (“bucket”) defined by a respective digest prefix;

for each data page associated with a reference count decremented to zero (0), logging, in a dedupe log, a digest prefix of the data page and a corresponding address associated with a location where the data page is stored; and

for each bucket of the on-drive dedupe index:

constructing, dynamically and on-demand, an address bag data structure (“address bag”);

storing, in the address bag, one or more addresses from the dedupe log whose corresponding digest prefix is the same as the respective digest prefix defining the bucket; and

removing, from the bucket, each index entry that includes an address matching one of the addresses in the address bag, the index entry being regarded as a stale index entry.