US 12,333,180 B2
Methods and apparatus for data processing and management in erasure-coded multi-cloud storage systems
Pak-Ching Lee, Shatin (HK); Hoi-Wan Chan, Shatin (HK); Shakeel Salamat Ullah, Shatin (HK); and Ng-Kwok Sing, Shatin (HK)
Assigned to Ng-Kwok Sing, Shatin (HK)
Filed by Ng-Kwok Sing, Shatin (HK)
Filed on May 22, 2023, as Appl. No. 18/321,367.
Prior Publication US 2024/0393975 A1, Nov. 28, 2024
Int. Cl. G06F 11/00 (2006.01); G06F 3/06 (2006.01); G06F 11/10 (2006.01)
CPC G06F 3/0655 (2013.01) [G06F 3/0604 (2013.01); G06F 3/067 (2013.01); G06F 11/1044 (2013.01)] 10 Claims
OG exemplary drawing
 
1. A data management system, comprising:
one or more proxies;
a plurality of agents;
a plurality of data containers; a metadata store;
a plurality of networks;
wherein each of the said plurality of agents is connected with the said one or more proxies by one or more networks in the said plurality of networks;
wherein each of the said plurality of agents is connected with each other by one or more networks in the said plurality of networks;
wherein each of the said plurality of agents is connected with one or more of the said plurality of data containers by one or more networks in the said plurality of networks;
wherein the said one or more proxies are connected with the said metadata store by one or more networks in the said plurality of networks;
wherein the said one or more proxies are configured to: accept data storage requests, divide data in the data storage requests into units, apply data processing on the data in the data storage requests to generate processed chunks of data, disperse the processed chunks of data to one or more agents, revert or delete the stored processed chunks of data in the said plurality of data containers if the proxy fails to store all processed chunks of data from a data unit, and reply to a received data storage request;
wherein the said one or more proxies are further configured to: decide the processed chunks of data to retrieve from the said plurality of data containers, decode the chunks of data received from the said plurality of agents into the decoded chunks of data, and reply to a storage request with the decoded data;
wherein the said one or more proxies are further configured to send the decoded chunks of data to one or more agents in the said plurality of agents;
wherein the said one or more proxies are further configured to: record the checksums of processed chunks of data in the said metadata store, scan all or part of the metadata in the said metadata store at a pre-configured interval of time to identify the processed chunks of data in the said plurality of data containers that are not available through any of the said plurality of agents or whose checksums do not match the ones stored in the said metadata store, mark the identified chunks in the said metadata store, and trigger a data repair method to repair the chunks marked in the said metadata store;
wherein the said plurality of agents is configured to: receive processed chunks of data from the said one or more proxies, store the received chunks of data into one or more of the said plurality of data containers, and revert or delete the processed chunks of data in one or more containers in the said plurality of data containers;
wherein the said plurality of agents is further configured to: retrieve chunks of data from one or more containers in the said plurality of data containers, and send the chunks of data to a said proxy;
wherein the said plurality of agents is further configured to: check the storage utilization of a said data container, and report the storage utilization and storage capacity of a said data container;
wherein the said plurality of agents is further configured to: encode the retrieved chunks of data, send the encoded chunks of data to a said proxy or a said agent, and decode chunks of data received from the said plurality of agents into decoded chunks of data;
wherein the said metadata store is configured to store the states and metadata of data managed by the system;
wherein each of the said plurality of data containers contains storage space provisioned from a cloud storage service that provides data access through file-based or object-based storage protocols;
wherein the said plurality of data containers is configured to store the processed chunks of data in the system; and
wherein each of the said plurality of data containers is further configured with an identifier that is unique among all the said plurality of agents.