US 12,265,513 B2
Word aware content defined chunking
Philip N. Shilane, Newtown, PA (US)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Dec. 21, 2023, as Appl. No. 18/392,968.
Application 18/392,968 is a continuation of application No. 17/376,954, filed on Jul. 15, 2021, granted, now 11,928,092.
Prior Publication US 2024/0126733 A1, Apr. 18, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01); G06F 16/22 (2019.01); G06F 16/242 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/2255 (2019.01); G06F 16/2425 (2019.01)] 19 Claims
OG exemplary drawing
 
1. A non-transitory stored therein instructions that are executable by one or more hardware processors to perform operations comprising:
moving a window from a first position in a data buffer to a second position in the data buffer, and the data buffer includes one or more words;
calculating a hash value of data in the window when the window is in the second position;
checking a byte that has entered the window, as a result of a movement of the window from the first position to the second position, to determine whether the byte is whitespace; and
when the hash value is a greatest hash value seen up to a current position of the window, and when the byte is determined to be whitespace, setting a candidate offset to a whitespace offset, and the candidate offset denotes a possible segment boundary that does not fall within any word in the data buffer.