US 12,066,933 B2
Combined garbage collection and data integrity checking for a distributed key-value store
Wei Sun, Boulder, CO (US); Mark David Olson, Longmont, CO (US); and Anil Paul Thoppil, Pleasanton, CA (US)
Assigned to NetApp, Inc., San Jose, CA (US)
Filed by NetApp, Inc., San Jose, CA (US)
Filed on Feb. 25, 2022, as Appl. No. 17/680,484.
Claims priority of provisional application 63/276,829, filed on Nov. 8, 2021.
Prior Publication US 2023/0145784 A1, May 11, 2023
Int. Cl. G06F 12/02 (2006.01); G06F 16/22 (2019.01)
CPC G06F 12/0253 (2013.01) [G06F 16/2246 (2019.01); G06F 16/2272 (2019.01)] 34 Claims
OG exemplary drawing
 
1. A method for combining garbage collection and data integrity checking on a distributed key-value (KV) store utilized by a cluster of a distributed storage management system, the method comprising:
during a metadata collection phase of the garbage collection, concurrently identifying within the distributed KV store (i) unused block identifiers (IDs), corresponding to data blocks that represent garbage to be collected, that are no longer in use by the cluster but that are present in the distributed KV store and (ii) data integrity errors in a form of missing block IDs that are in use by the cluster but that are missing from the distributed KV store;
marking the unused block IDs for deletion from the distributed KV store;
adding the missing block IDs to a list of block IDs for which remediation of the respective data integrity errors is to be subsequently performed;
perform the garbage collection at a first predetermined or configurable interval with truncated block IDs; and
perform the garbage collection at a second predetermined or configurable interval with full block IDs.