US 12,293,076 B2
High-compression, high-volume deduplication cache
Luke A. Higgins, Silver Spring, MD (US); and Robert R. Bruno, Columbia, MD (US)
Assigned to MORGAN STANLEY SERVICES GROUP INC., New York, NY (US)
Filed by MORGAN STANLEY SERVICES GROUP INC., New York, NY (US)
Filed on Jul. 24, 2022, as Appl. No. 17/871,972.
Application 17/871,972 is a continuation in part of application No. 17/502,898, filed on Oct. 15, 2021, granted, now 11,422,977.
Prior Publication US 2023/0124863 A1, Apr. 20, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 3/00 (2006.01); G06F 3/06 (2006.01); G06F 12/00 (2006.01); G06F 12/084 (2016.01)
CPC G06F 3/0608 (2013.01) [G06F 3/0641 (2013.01); G06F 3/0673 (2013.01); G06F 12/084 (2013.01); G06F 2212/62 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A system for caching and deduplicating a plurality of segments of data, comprising:
at least one server having one or more processors;
at least one database; and
non-transitory memory comprising instructions that, when executed by the one or more processors, cause the one or more processors to:
receive, at the server, a first segment of data from the plurality of segments of data;
identify a value of a first data field in the first segment of data, the value of the first data field comprising a unique source identifier;
perform a transformation on the value of the first data field to obtain a transformed source identifier;
identify a value of a second data field in the first segment of data, the second data field being densely populated by values in the plurality of received segments of data;
partition the value of the second data field into a first partition comprising more significant bits and a second partition comprising less significant bits;
generate a first key based on the transformed source identifier and the first partition comprising more significant bits;
store, in the at least one database, an entry associating the first key with a bitmap, the bitmap having a maximum length equal to a maximum number of possible values a bitmap of equal length to the second partition could validly take;
set a single bit of the bitmap, corresponding to a value of the second partition, to true;
receive, at the server, a second segment of data from the plurality of segments of data;
likewise generate a second key based on a transformed source identifier in the second segment of data and a value of a first partition of the second data field in the second segment of data;
retrieve a bitmap associated with the second key; and
based on a set bit in the retrieved bitmap corresponding to a value of the second partition of the second data field of the second segment of data, determine that the second segment of data had previously been received by the server.