US 11,886,298 B2
Using a storage log to generate an incremental backup
Amandeep Gautam, San Jose, CA (US); Anand Arun, San Jose, CA (US); Debasish Garai, Santa Clara, CA (US); Rupesh Bajaj, Dewas (IN); Himanshu Mehra, Mountain View, CA (US); Vairavanathan Emalayan, Vancouver (CA); and Apurv Gupta, Bangalore (IN)
Assigned to Cohesity, Inc., San Jose, CA (US)
Filed by Cohesity, Inc., San Jose, CA (US)
Filed on Mar. 31, 2021, as Appl. No. 17/218,619.
Prior Publication US 2022/0318095 A1, Oct. 6, 2022
Int. Cl. G06F 11/14 (2006.01); G06F 16/11 (2019.01)
CPC G06F 11/1451 (2013.01) [G06F 16/128 (2019.01)] 16 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving an identification of a new primary snapshot created for a primary storage system;
determining a threshold time window based on a capture time associated with the new primary snapshot, wherein:
the capture time is generated using a snapshot service clock;
the threshold time window includes a pre-new primary snapshot marker and a post-new primary snapshot marker;
the pre-new primary snapshot marker corresponds to a first event in a storage log before the capture time;
the pre-new primary snapshot marker is determined based on a storage log clock, wherein the snapshot service clock is different from the storage log clock;
the post-new primary snapshot marker is determined based on the storage log clock; and
the post-new primary snapshot marker corresponds to a second event in the storage log after the capture time;
analyzing entries of the storage log of the primary storage system occurring within the threshold time window to identify any objects of the primary storage system that have changed during the threshold time window; and
identifying changed objects to capture in a new secondary backup stored at a secondary storage system and corresponding to the new primary snapshot, including by comparing metadata of the new primary snapshot and metadata of a previous secondary backup to determine for each of the objects of the primary storage system identified as having changed during the threshold time window whether a change to an object since the previous secondary backup is captured in the new primary snapshot, wherein the identifying of the changed objects comprises:
in response to determining that a timestamp of the snapshot service clock differs from a corresponding timestamp of the storage log clock, traversing a corresponding directory of the new primary snapshot that is associated with a changed object of the changed objects.