US 12,405,929 B2
Moving window data deduplication in distributed storage
Pavlo Padinker, Redmond, WA (US); Pavan Edara, Redmond, WA (US); and Bigang Li, Redmond, WA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Jul. 26, 2023, as Appl. No. 18/226,314.
Application 18/226,314 is a continuation of application No. 17/876,660, filed on Jul. 29, 2022, granted, now 11,762,821.
Application 17/876,660 is a continuation of application No. 17/007,495, filed on Aug. 31, 2020, granted, now 11,442,911, issued on Sep. 13, 2022.
Prior Publication US 2023/0376470 A1, Nov. 23, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/215 (2019.01); G06F 12/0804 (2016.01); G06F 16/22 (2019.01); G06F 16/23 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 12/0804 (2013.01); G06F 16/2282 (2019.01); G06F 16/2322 (2019.01); G06F 16/235 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method for deduplication, comprising:
receiving, with one or more processors, a first request to write data, the first request including a first insert identifier uniquely identifying the data for determining whether the data is duplicate data;
comparing, with the one or more processors, the first insert identifier with other insert identifiers that have been stored in a table within a time window of predetermined duration moving relative to a current time, the other insert identifiers being stored in the table based on a timestamp associated with each of the other insert identifiers;
determining, with the one or more processors, that the data corresponding to the first insert identifier is not duplicate data based on the first insert identifier not being equivalent to any of the other insert identifiers;
storing, with the one or more processors, the first insert identifier with a timestamp in the table in response to determining that the data corresponding to the first insert identifier is not duplicate data; and
updating, with the one or more processors, the table to remove one or more insert identifiers of the other insert identifiers added before the time window of predetermined duration based on the timestamp associated with each of the other insert identifiers.