US 11,960,458 B2
Deduplicating data at sub-block granularity
Philippe Armangau, Acton, MA (US); Sorin Faibish, Newton, MA (US); Istvan Gonczi, Berkley, MA (US); Ivan Bassov, Brookline, MA (US); and Vamsi K. Vankamamidi, Hopkinton, MA (US)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Mar. 17, 2021, as Appl. No. 17/203,876.
Application 17/203,876 is a continuation of application No. 16/176,729, filed on Oct. 31, 2018, granted, now 10,963,436.
Prior Publication US 2021/0286783 A1, Sep. 16, 2021
Int. Cl. G06F 16/215 (2019.01); G06F 3/06 (2006.01); G06F 16/907 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 3/0608 (2013.01); G06F 3/0641 (2013.01); G06F 3/0673 (2013.01); G06F 16/907 (2019.01)] 21 Claims
OG exemplary drawing
 
1. A method of performing data deduplication, comprising:
providing a deduplication database in a data storage system, the deduplication database configured to store multiple entries that associate digest values computed from respective sub-blocks of data blocks with references to locations in the data storage system where data blocks containing the respective sub-blocks can be found;
implementing a policy for adding entries to the deduplication database, the policy specifying creation of new entries for first sub-blocks and last sub-blocks of data blocks but not for intermediate sub-blocks of those data blocks; and
processing a new data block for deduplication by (i) identifying multiple sub-blocks of the new data block, (ii) computing respective new digest values from the sub-blocks of the new data block, (iii) matching one of the new digest values to an entry in the deduplication database, and (iv) storing the new data block at least in part by reference to a target data block whose location is referenced by the matching entry.