| CPC G06F 16/215 (2019.01) [G06F 16/221 (2019.01); G06F 16/24556 (2019.01); G06F 16/273 (2019.01); G06F 16/282 (2019.01)] | 11 Claims |

|
1. A deduplication system for a storage unit, the deduplication system comprising:
an associative memory device to perform associative processing and comprising a memory array having columns divided into sections, of which a fingerprint section stores a plurality of fingerprints associated with blocks of data, each fingerprint being stored in a separate column of said fingerprint section;
said associative memory device also comprising:
a similarity searcher operating on said columns to receive an input fingerprint of an input block and to perform a search inside columns of said fingerprint section for a similar fingerprint whose distance to said input fingerprint is smaller than a predetermined threshold value; and
a difference calculator operating on said columns to compute a difference block indicating relative changes between said input block and a similar block associated with said similar fingerprint, if found; and
a difference block storage manager to, if said difference block is a non-empty difference block, associate said input fingerprint with said similar block and with said difference block, store said input fingerprint in one column of said fingerprint section, and store said non-empty difference block in said storage unit,
wherein said fingerprint section is arranged in a multi-level structure wherein upper levels comprise centroids to clusters in lower levels, and a lowest level comprises fingerprints of blocks, said centroids calculated from said fingerprints.
|