US 11,740,977 B2
Efficient deduplication based file movement for load balancing in a scaled-out backup system
Alok Katiyar, Santa Clara, CA (US); Srisailendra Yallapragada, Cupertino, CA (US); Chetan Risbud, Milpitas, CA (US); and Sanjay Vedanthan, Santa Clara, CA (US)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Jan. 27, 2020, as Appl. No. 16/773,906.
Prior Publication US 2021/0232459 A1, Jul. 29, 2021
Int. Cl. G06F 11/14 (2006.01); G06F 16/174 (2019.01); G06F 16/22 (2019.01); G06F 16/2457 (2019.01); G06F 16/21 (2019.01); G06F 16/11 (2019.01); G06F 11/30 (2006.01)
CPC G06F 11/1453 (2013.01) [G06F 11/1451 (2013.01); G06F 11/1464 (2013.01); G06F 11/1469 (2013.01); G06F 11/3034 (2013.01); G06F 16/128 (2019.01); G06F 16/1748 (2019.01); G06F 16/219 (2019.01); G06F 16/2246 (2019.01); G06F 16/24573 (2019.01)] 11 Claims
OG exemplary drawing
 
1. A computer-implemented method of dynamically balancing cloud resource capacity in a multi-node network having a file system, comprising:
providing a deduplication backup system comprising a backup server;
determining, through a cluster-wide file migration to cloud (FMIG) process that is internal to the deduplication backup system, a destination node of the multi-node network with dedicated cloud storage capable of storing a file selected for long term retention, in a local storage of a source node of the multi-node network;
negotiating an initial state of the source node and the destination node through a state sharing process of initial preparatory work including opening a new file on the destination node and storing file metadata on the local storage for a cloud tier of the destination node, wherein the metadata points to actual file data to be moved to the dedicated cloud storage;
associating a current cloud capacity to each respective node of the multi-node network;
transferring the file directly to the dedicated cloud storage without storing the actual file data on the local storage of the destination node, when a current cloud capacity of the destination node is adequate, otherwise transferring the file to cloud storage of a different node having sufficient cloud capacity to balance utilization of cloud storage among nodes of the multi-node network;
building and persisting, during the transferring step, a metadata segment tree on the destination node through metadata references for file data marked for long-term storage;
storing the metadata tree on the source node without storing the metadata in the cloud storage, wherein the transferring step uses deduplication processes of the backup server to prevent unnecessary copying of files already moved to the cloud storage by comparing data references from the metadata segment tree on the destination node to metadata stored in a tree for files that have been moved to the cloud storage;
sending, from the destination node to the source node, a list of references for data that is not moved to cloud storage for a metadata tree stored on the source node, wherein the source node responds to the destination node with any data that has not been previously moved to the cloud storage; and
updating a global namespace of the file system with a handle indicating a current location of the file as the dedicated cloud storage of the destination file through the metadata segment tree, the handle allowing access to the file through the metadata stored in the local storage of the source node.