US 12,259,859 B2
Deduplication of data via associative similarity search
Avidan Akerib, Tel Aviv (IL); Dan Ilan, Herzliya (IL); Eli Ehrman, Beit Shemesh (IL); and Elona Erez, Tel Aviv (IL)
Assigned to GSI Technology Inc., Sunnyvale, CA (US)
Filed by GSI Technology Inc., Sunnyvale, CA (US)
Filed on Jun. 25, 2020, as Appl. No. 16/911,429.
Claims priority of provisional application 62/978,336, filed on Feb. 19, 2020.
Claims priority of provisional application 62/888,580, filed on Aug. 19, 2019.
Prior Publication US 2021/0056085 A1, Feb. 25, 2021
Int. Cl. G06F 16/21 (2019.01); G06F 16/215 (2019.01); G06F 16/22 (2019.01); G06F 16/2455 (2019.01); G06F 16/27 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/221 (2019.01); G06F 16/24556 (2019.01); G06F 16/273 (2019.01); G06F 16/282 (2019.01)] 11 Claims
OG exemplary drawing
 
1. A deduplication system for a storage unit, the deduplication system comprising:
an associative memory device to perform associative processing and comprising a memory array having columns divided into sections, of which a fingerprint section stores a plurality of fingerprints associated with blocks of data, each fingerprint being stored in a separate column of said fingerprint section;
said associative memory device also comprising:
a similarity searcher operating on said columns to receive an input fingerprint of an input block and to perform a search inside columns of said fingerprint section for a similar fingerprint whose distance to said input fingerprint is smaller than a predetermined threshold value; and
a difference calculator operating on said columns to compute a difference block indicating relative changes between said input block and a similar block associated with said similar fingerprint, if found; and
a difference block storage manager to, if said difference block is a non-empty difference block, associate said input fingerprint with said similar block and with said difference block, store said input fingerprint in one column of said fingerprint section, and store said non-empty difference block in said storage unit,
wherein said fingerprint section is arranged in a multi-level structure wherein upper levels comprise centroids to clusters in lower levels, and a lowest level comprises fingerprints of blocks, said centroids calculated from said fingerprints.