| CPC G06F 3/064 (2013.01) [G06F 3/0626 (2013.01); G06F 3/0671 (2013.01)] | 20 Claims |

|
1. A computer-implemented method for indexing a data item in a data storage system and handling metadata updates, the method comprising:
dividing, by the data storage system, the data item into one or more large blocks;
dividing, by the data storage system, each large block into a plurality of small blocks;
calculating, by the data storage system, a strong hash value for each of the small blocks of a respective large block and storing a respective list of the strong hash values calculated for the respective large block with a pointer to a location of the respective large block;
from the respective list of strong hash values calculated for the respective large block, selecting, by the data storage system, one or more representative strong hash values for the respective large block;
compiling, by the data storage system, a sparse index of weak hashes by calculating, for each of the one or more representative strong hash values, a respective weak hash, wherein each respective weak hash corresponds to a respective strong hash value for a respective large block;
storing, by the data storage system, the sparse index in a fast-access memory, wherein the fast-access memory comprises a random access memory (RAM), and wherein respective lists of strong hash values are stored in a disk storage, wherein the disk storage comprises a spinning disk storage or a solid-state drive (SSD);
receiving, by the data storage system, a metadata update request;
determining, by the data storage system, whether an incoming data item corresponding to the metadata update request already exists on the data storage system using the sparse index of weak hashes stored in the fast-access memory; and
rejecting, by the data storage system, the incoming data item based on determining that the incoming data item already exists on the data storage system.
|