US 11,928,092 B2
Word aware content defined chunking
Philip N. Shilane, Newtown, PA (US)
Assigned to EMC IP HOLDING COMPANY LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Jul. 15, 2021, as Appl. No. 17/376,954.
Prior Publication US 2023/0017347 A1, Jan. 19, 2023
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01); G06F 16/22 (2019.01); G06F 16/242 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/2255 (2019.01); G06F 16/2425 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
moving a window from a first position in a data buffer to a second position in the data buffer, and the data buffer includes one or more words;
calculating a hash value of data in the window when the window is in the second position;
checking a byte that has entered the window, as a result of a movement of the window from the first position to the second position, to determine whether the byte is whitespace; and
when the hash value is a greatest hash value seen up to a current position of the window, and when the byte is determined to be whitespace, setting a candidate offset to a whitespace offset, and the candidate offset denotes a possible segment boundary that does not fall within any word in the data buffer.