US 12,032,534 B2
Inline deduplication using stream detection
Nickolay Dalmatov, St. Petersburg (RU); Richard Ruef, Santa Cruz, CA (US); and Kurt Everson, Richmond, TX (US)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Aug. 2, 2019, as Appl. No. 16/530,139.
Prior Publication US 2021/0034584 A1, Feb. 4, 2021
Int. Cl. G06F 16/215 (2019.01); G06F 16/2455 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/24568 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving a digest for a deduplication candidate;
detecting at least one stream associated with the deduplication candidate, wherein the at least one stream is detected based on analyzing a target volume of the deduplication candidate, a target logical unit of the deduplication candidate, a target logical unit identifier of the deduplication candidate, sequential writing of the deduplication candidate, writing targeted to a limited region of logical space associated with the deduplication candidate, or a client host and port identifiers of the deduplication candidate;
loading at least one neighboring digest segment of a first loaded digest segment associated with the at least one stream, wherein the at least one neighboring digest segment and the first digest segment are loaded in an index table associated with the at least one stream, and wherein the neighboring digest segment comprises a digest segment that is located sequentially preceding or sequentially following the first digest segment in the index table;
determining whether the digest is located in the at least one neighboring digest segment; and
based on a negative result of the determining, processing the digest, the processing including:
generating a mask;
determining if the digest qualifies as a sample digest based on the mask; and
based on a positive determination that the digest qualifies as a sample digest, searching for the digest in the index table associated with the at least one stream.