US 12,293,102 B2
Method, device, and computer program product for de-duplicating data
Changxu Jiang, Chengdu (CN); Chen Gong, Beijing (CN); and Fei Wang, Chengdu (CN)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Jun. 28, 2023, as Appl. No. 18/215,414.
Claims priority of application No. 202211658388.5 (CN), filed on Dec. 22, 2022.
Prior Publication US 2024/0211154 A1, Jun. 27, 2024
Int. Cl. G06F 3/06 (2006.01)
CPC G06F 3/0641 (2013.01) [G06F 3/0608 (2013.01); G06F 3/0683 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for de-duplicating data, comprising:
determining a target physical block in a first storage device, a plurality of data blocks in the target physical block being to be transferred to a second storage device;
determining a compression ratio of a target data block in the plurality of data blocks;
determining a target hash value of the target data block in response to the compression ratio being lower than a threshold compression ratio; and
determining a de-duplication operation for the target data block based on the target hash value and a de-duplication hash table, the de-duplication hash table storing hash values of data blocks that have been transferred from the first storage device to the second storage device;
wherein determining the de-duplication operation comprises:
determining from a plurality of logically contiguous data blocks a group of logically contiguous data blocks starting from the target data block, hash values of the group of logically contiguous data blocks hitting data blocks in the de-duplication hash table that are located in contiguous physical space;
determining whether the number of data blocks in the group of logically contiguous data blocks exceeds a threshold number; and
de-duplicating the group of logically contiguous data blocks in response to the number exceeding the threshold number.