CPC G06F 16/215 (2019.01) [G06F 16/24568 (2019.01); H04L 9/0643 (2013.01)] | 20 Claims |
1. A method for data processing, comprising:
(a) receiving one or more input data streams from one or more client applications;
(b) generating at least a first segment and a second segment from said one or more input data streams, wherein said first segment comprises a first set of chunks and said second segment comprises a second set of chunks;
(c) computing (i) a first set of fingerprints of said first set of chunks and (ii) a second set of fingerprints of said second set of chunks;
(d) comparing said first set of fingerprints with said second set of fingerprints to generate a similarity score indicative of a degree of similarity between said first segment and said second segment; and
(e) upon determining said second segment is similar to said first segment when said similarity score is equal to or greater than a similarity threshold that is between 5% to 99%, further processing said first set of chunks of said first segment and said second set of chunks of said second segment by performing a differencing operation to determine a difference between said first segment and said second segment, wherein the difference includes at least a difference between a first chunk from said first set of chunks and a second chunk from said second set of chunks.
|