| CPC G06F 11/1448 (2013.01) [G06F 2201/80 (2013.01)] | 10 Claims |

|
1. A method for garbage collection in a backup system, comprising:
traversing metadata associated with a plurality of backups stored in the backup system, and dividing the metadata into a first set associated with valid backups and a second set associated with invalid backups, wherein each of the plurality of backups is represented in the metadata, respectively, by a tree representation of data of the plurality of backups stored in the backup system, each tree representation comprising a plurality of nodes, each of the plurality of nodes comprising a corresponding one of a plurality of hashes including a root hash that references at least a second of the plurality of hashes that references an element of a storage device containing a portion of the data of the plurality of backups, wherein at least one of the plurality of hashes associated with a first backup references a same portion of the data as a different one of the plurality of hashes from a different one of the plurality of backups, and wherein the data stored in the backup system for each of the plurality of backups is represented by a different subset of the plurality of nodes of the tree representation, wherein the metadata is divided by assigning, to each root hash of each tree representation associated with the valid backup, a respective single bit flag having a first value, and assigning, to each root hash of each tree representation associated with the invalid backup, a respective single bit flag having a second value;
for each of the plurality of nodes of each valid backup and not for each invalid backup, generating a flag indicating that the node is associated with the valid backup;
determining, based on the respective single bit flag associated with a root hash of the plurality of nodes in the tree representation of the backup system, whether the root hash in the tree representation is associated with the valid backup or an invalid backup;
in response to respective single bit flag indicating that the root hash is associated with the valid backup, skipping processing of the respective tree representation of the backup system associated with the valid backup and keeping the respective tree representation; and
in response to the single bit flag indicating that the root hash is associated with the invalid backup, deleting the root hash of the respective tree representation associated with the invalid backup and traversing additional one or more nodes of the respective tree presentation associated with the invalid backup; and
for each of the additional one or more nodes of the respective tree representation of the respective invalid backup,
determining, whether a corresponding single bit flag is found, and
in response to the corresponding single bit flag not being found, deleting the respective one of the additional one or more nodes.
|