CPC G06F 16/215 (2019.01) [G06F 3/0608 (2013.01); G06F 3/0641 (2013.01); G06F 3/0673 (2013.01); G06F 16/907 (2019.01)] | 21 Claims |
1. A method of performing data deduplication, comprising:
providing a deduplication database in a data storage system, the deduplication database configured to store multiple entries that associate digest values computed from respective sub-blocks of data blocks with references to locations in the data storage system where data blocks containing the respective sub-blocks can be found;
implementing a policy for adding entries to the deduplication database, the policy specifying creation of new entries for first sub-blocks and last sub-blocks of data blocks but not for intermediate sub-blocks of those data blocks; and
processing a new data block for deduplication by (i) identifying multiple sub-blocks of the new data block, (ii) computing respective new digest values from the sub-blocks of the new data block, (iii) matching one of the new digest values to an entry in the deduplication database, and (iv) storing the new data block at least in part by reference to a target data block whose location is referenced by the matching entry.
|