US 11,892,979 B2
	Storage system garbage collection and defragmentation
Anubhav Gupta, Sunnyvale, CA (US); and Anirvan Duttagupta, San Jose, CA (US)
Assigned to Cohesity, Inc., San Jose, CA (US)
Filed by Cohesity, Inc., San Jose, CA (US)
Filed on Dec. 13, 2021, as Appl. No. 17/549,599.
Application 17/549,599 is a continuation of application No. 16/733,605, filed on Jan. 3, 2020, granted, now 11,226,934.
Application 16/733,605 is a continuation of application No. 16/279,780, filed on Feb. 19, 2019, granted, now 10,706,014, issued on Jul. 7, 2020.
Prior Publication US 2022/0179828 A1, Jun. 9, 2022
Int. Cl. G06F 16/00 (2019.01); G06F 16/17 (2019.01); G06F 3/06 (2006.01); G06F 16/901 (2019.01); G06F 12/02 (2006.01); G06F 16/182 (2019.01); G06F 16/908 (2019.01)

CPC G06F 16/1724 (2019.01) [G06F 3/0604 (2013.01); G06F 3/0643 (2013.01); G06F 3/0683 (2013.01); G06F 12/0253 (2013.01); G06F 16/182 (2019.01); G06F 16/908 (2019.01); G06F 16/9027 (2019.01)]

16 Claims

1. A method, comprising:

scanning, by a processor of a storage system, a plurality of tree data structures to determine a number of corresponding references associated with each data chunk included in a plurality of chunk files stored by the storage system and scanning the plurality of chunk files to determine a plurality of corresponding chunk file scores for each chunk file included in the plurality of chunk files stored by the storage system;

determining that a subset of the plurality of chunk files have corresponding chunk file scores above a chunk file score threshold, wherein determining the plurality of corresponding chunk file scores for the plurality of chunk files stored by the storage system includes assigning a worker of a plurality of workers to a corresponding tree data structure in the plurality of tree data structures stored by the storage system, wherein at least two workers of the plurality of workers each scan a different assigned tree data structure of the plurality of tree data structures in parallel;

including data chunks from the subset of the plurality of chunk files in a combined chunk file; and

deleting chunk files included in the subset of the plurality of chunk files from which the data chunks were included in the combined chunk file.