US 12,135,691 B1
Searching for and storing data chunks based off similarity
Yogev Vaknin, Tel Aviv (IL); Niko Farhi, Tel Aviv (IL); and Asaf Levy, Tel Aviv (IL)
Assigned to VAST DATA LTD., Tel Aviv (IL)
Filed by VAST DATA LTD., Tel Aviv (IL)
Filed on Oct. 26, 2022, as Appl. No. 18/050,046.
Int. Cl. G06F 16/174 (2019.01); G06F 16/14 (2019.01); G06F 16/16 (2019.01)
CPC G06F 16/1744 (2019.01) [G06F 16/152 (2019.01); G06F 16/164 (2019.01)] 25 Claims
OG exemplary drawing
 
1. A method for storing a received data chunk in a storage system, the method comprises:
obtaining a received fingerprint of the received data chunk, wherein the received fingerprint comprises received fingerprint elements that are associated with content elements of the received data chunk, and wherein each of the received fingerprint elements is indicative of a number of occurrences, within the received data chunk, of an associated content element of the received data chunk elements, wherein the received fingerprint elements are ordered within the received fingerprint, according to a given order of the content elements;
searching, within a tree, for a similar stored fingerprint, the similar stored fingerprint is a stored fingerprint that is similar to the received fingerprint; wherein the tree comprise tree nodes that represent multiple stored fingerprints of stored data chunks that are stored in the storage system; wherein different levels of the tree are allocated to different content elements, wherein each level of the tree is allocated to a specific content element, out of a set of content elements that may appear in a data chunk;
compressing, when finding the similar stored fingerprint, the received data chunk based on a similar data chunk associated with the similar stored fingerprint, and updating storage system metadata to indicate that the received data chunk is stored in the storage system in a compressed form, and based on the similar stored data chunk;
wherein the updating the storage system metadata comprises updating a metadata table with an entry corresponding to the received data chunk that depends on the similar data chunk, a pointer to the corresponding similar data chunk, and a pointer to a delta portion resulting from the compression; and
storing, when failing to find the similar stored fingerprint, the received data chunk, and updating the tree to indicate that the received data chunk is stored in the storage system.