US 12,072,798 B2
Scalable garbage collection for deduplicated storage
Philip Shilane, Newtown, PA (US); Kimberly Lu, Sunnyvale, CA (US); Joseph Brandt, Emeryville, CA (US); Nicholas Noto, Sunnyvale, CA (US); Tipper Truong, San Jose, CA (US); and Mariah Arevalo, Sunnyvale, CA (US)
Assigned to EMC IP HOLDING COMPANY LLC, Hopkinton, MA (US)
Filed by EMC IP HOLDING COMPANY LLC, Hopkinton, MA (US)
Filed on Jul. 17, 2021, as Appl. No. 17/378,690.
Application 17/378,690 is a division of application No. 16/265,491, filed on Feb. 1, 2019, granted, now 11,068,390, issued on Jul. 20, 2021.
Prior Publication US 2021/0342264 A1, Nov. 4, 2021
Int. Cl. G06F 12/02 (2006.01)
CPC G06F 12/0253 (2013.01) [G06F 2212/1044 (2013.01); G06F 2212/1048 (2013.01); G06F 2212/154 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method comprising:
identifying impacted similarity groups that are impacted by a garbage collection operation, wherein the impacted similarity groups include segments associated with deleted objects and segments associated with live objects;
identifying sub-groups, of each of the impacted similarity groups, impacted by the garbage collection operation;
write locking at least the impacted sub-groups of the impacted similarity groups in order to support a normal operation and the garbage collection operation concurrently in the impacted similarity groups;
directing normal operations to a highest numbered sub-group of each of the impacted similarity group, the normal operations including a write operation;
determining sizes of the impacted similarity groups individually and determining a size of all the impacted similarity groups;
determining a number of workers to perform the garbage collection operation that cleans the deleted objects stored in the storage system from objects stored in the storage system based on the sizes of the impacted similarity groups individually and a size of all the impacted similarity groups;
assigning each of the workers a range of the impacted similarity groups based on the sizes impacted similarity groups and the size of all the impacted similarity groups;
removing the segments associated with the deleted objects from the impacted sub-groups; and
updating the impacted similarity groups to reflect that the segments associated with the deleted objects have been removed.