US 12,271,613 B2
Inline snapshot deduplication
Shubham Tagra, Bengaluru (IN)
Assigned to Rubrik, Inc., Palo Alto, CA (US)
Filed by Rubrik, Inc., Palo Alto, CA (US)
Filed on Nov. 1, 2022, as Appl. No. 17/978,901.
Prior Publication US 2024/0143212 A1, May 2, 2024
Int. Cl. G06F 3/06 (2006.01)
CPC G06F 3/0641 (2013.01) [G06F 3/0608 (2013.01); G06F 3/061 (2013.01); G06F 3/067 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
determining, by a data management system, to obtain a first snapshot of a first virtual machine, the first virtual machine storing a plurality of data blocks;
selecting, prior to obtaining the first snapshot of the first virtual machine and from among a plurality of previously obtained snapshots of one or more virtual machines, a second snapshot of a second virtual machine to use for deduplication of the first snapshot of the first virtual machine, wherein selecting the second snapshot is based at least in part on a second composite hash associated with the second snapshot being one of a set of second composite hashes associated with the plurality of previously obtained snapshots that is most similar to a first composite hash associated with the plurality of data blocks; and
obtaining the first snapshot of the first virtual machine after selecting the second snapshot of the second virtual machine to use for deduplication, wherein obtaining the first snapshot of the first virtual machine comprises:
writing a first subset of data blocks of the plurality of data blocks from the first virtual machine to a snapshot file for the first snapshot based at least in part on the first subset of data blocks from the first virtual machine being different from a first corresponding subset of the second snapshot; and
refraining from writing a second subset of data blocks of the plurality of data blocks from the first virtual machine to the snapshot file for the first snapshot based at least in part on the second subset of data blocks from the first virtual machine matching a second corresponding subset of the second snapshot.