US 11,693,572 B2
Optimized deduplication based on backup frequency in a distributed data storage system
Bharat Pundalik Naik, Palo Alto, CA (US); Xiangyu Wang, Fremont, CA (US); and Avinash Lakshman, Fremont, CA (US)
Assigned to Commvault Systems, Inc., Tinton Falls, NJ (US)
Filed by Commvault Systems, Inc., Tinton Falls, NJ (US)
Filed on Mar. 31, 2022, as Appl. No. 17/710,600.
Application 17/710,600 is a continuation of application No. 17/153,667, filed on Jan. 20, 2021, granted, now 11,513,708.
Claims priority of provisional application 63/070,162, filed on Aug. 25, 2020.
Prior Publication US 2022/0222000 A1, Jul. 14, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 3/06 (2006.01); H04L 67/1097 (2022.01)
CPC G06F 3/0641 (2013.01) [G06F 3/067 (2013.01); G06F 3/0608 (2013.01); G06F 3/0619 (2013.01); G06F 3/0664 (2013.01); G06F 3/0665 (2013.01); G06F 3/0683 (2013.01); H04L 67/1097 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
by a first computing device, which comprises one or more hardware processors and data storage resources, wherein the first computing device is configured to execute a storage proxy:
intercepting a first write request addressed to a first user virtual disk configured on a distributed data storage system, wherein the first user virtual disk is distinct from a deduplication virtual disk, wherein the first write request comprises a first data block addressed to the first user virtual disk, and
causing the first data block to be stored in the deduplication virtual disk, at data storage resources of a second storage service node, wherein the second storage service node comprises one or more hardware processors and data storage resources, wherein the second storage service node is configured to execute a data storage subsystem; and
by a first storage service node, which comprises one or more hardware processors and data storage resources, wherein the first storage service node is configured to execute a metadata subsystem:
assigning an expiry timeframe to a first unique system-wide identifier that is based on a hash value of, and is associated with, the first data block,
wherein the expiry timeframe is based at least in part on an arrival timeframe of the first write request at the storage proxy and is further based on a frequency of full-backup operations configured for the first user virtual disk, and
causing the second storage service node to delete the first data block from the deduplication virtual disk, based on determining that (i) a current timeframe is later than the expiry timeframe of the first unique system-wide identifier and (ii) no user virtual disk in the distributed data storage system makes reference to the first unique system-wide identifier.