US 12,455,698 B2
System and method for indexing a data item in a data storage system
Ovad Somech, Hod Hasharon (IL); Assaf Natanzon, Hod Hasharon (IL); Idan Zach, Hod Hasharon (IL); Aviv Kuvent, Hod Hasharon (IL); Yair Toaff, Hod Hasharon (IL); Elizabeth Firman, Hod Hasharon (IL); and David Spinadel, Hod Hasharon (IL)
Assigned to Huawei Technologies Co., Ltd., Shenzhen (CN)
Filed by HUAWEI TECHNOLOGIES CO., LTD., Shenzhen (CN)
Filed on Sep. 5, 2023, as Appl. No. 18/461,261.
Application 18/461,261 is a continuation of application No. PCT/EP2021/061371, filed on Apr. 30, 2021.
Prior Publication US 2023/0409222 A1, Dec. 21, 2023
Int. Cl. G06F 3/06 (2006.01)
CPC G06F 3/064 (2013.01) [G06F 3/0626 (2013.01); G06F 3/0671 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for indexing a data item in a data storage system and handling metadata updates, the method comprising:
dividing, by the data storage system, the data item into one or more large blocks;
dividing, by the data storage system, each large block into a plurality of small blocks;
calculating, by the data storage system, a strong hash value for each of the small blocks of a respective large block and storing a respective list of the strong hash values calculated for the respective large block with a pointer to a location of the respective large block;
from the respective list of strong hash values calculated for the respective large block, selecting, by the data storage system, one or more representative strong hash values for the respective large block;
compiling, by the data storage system, a sparse index of weak hashes by calculating, for each of the one or more representative strong hash values, a respective weak hash, wherein each respective weak hash corresponds to a respective strong hash value for a respective large block;
storing, by the data storage system, the sparse index in a fast-access memory, wherein the fast-access memory comprises a random access memory (RAM), and wherein respective lists of strong hash values are stored in a disk storage, wherein the disk storage comprises a spinning disk storage or a solid-state drive (SSD);
receiving, by the data storage system, a metadata update request;
determining, by the data storage system, whether an incoming data item corresponding to the metadata update request already exists on the data storage system using the sparse index of weak hashes stored in the fast-access memory; and
rejecting, by the data storage system, the incoming data item based on determining that the incoming data item already exists on the data storage system.