US 11,934,656 B2
Garbage collection and bin synchronization for distributed storage architecture
Manan Dahyabhai Patel, Sunnyvale, CA (US); and Wei Sun, Boulder, CO (US)
Assigned to NetApp, Inc., San Jose, CA (US)
Filed by NetApp Inc., San Jose, CA (US)
Filed on Apr. 11, 2022, as Appl. No. 17/717,454.
Prior Publication US 2023/0325081 A1, Oct. 12, 2023
Int. Cl. G06F 3/06 (2006.01)
CPC G06F 3/0608 (2013.01) [G06F 3/0652 (2013.01); G06F 3/067 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
a distributed storage architecture including worker nodes managing distributed storage comprised of storage devices managed by the worker nodes;
a slice service hosted at each of the worker nodes, wherein an instance of the slice service at a worker node generates a probabilistic structure used to indicate block identifiers of in-use blocks of the distributed storage that are used by the worker node to store data;
a block service hosted at each of the worker nodes, wherein an instance of the block service at the worker node manages bins composed of one or more blocks of the distributed storage managed by the worker node;
a garbage collection process hosted through the block service at each of the worker nodes, wherein an instance of the garbage collection process at the worker node performs garbage collection rounds by comparing probabilistic structures, received from instances of the slice service at the worker nodes, to block identifiers within a subset of a bin to identify and free unused blocks within the subset of the bin, wherein the unused blocks correspond to block identifiers not indicated by the probabilistic structures; and
a garbage collection management service dynamically selecting a portion of the bin as the subset of the bin to process during a garbage collection round based upon heuristics corresponding to at least one of an amount of fullness of the distributed storage, a time elapsed since initialization of the garbage collection process, or an amount of unused blocks being freed.