US 12,353,357 B2
	Systems and methods for formatting and reconstructing raw data
Raphael Glon, Guilers (FR); and Gaetan Cottereau, Kersaint-Plabennec (FR)
Assigned to OVH, Roubaix (FR)
Filed by OVH, Roubaix (FR)
Filed on May 5, 2021, as Appl. No. 17/308,159.
Claims priority of application No. 20315273 (EP), filed on May 28, 2020.
Prior Publication US 2021/0374102 A1, Dec. 2, 2021
Int. Cl. G06F 16/17 (2019.01); G06F 16/11 (2019.01)

CPC G06F 16/1727 (2019.01) [G06F 16/122 (2019.01)]

12 Claims

1. A method of formatting raw data, comprising:

accessing the raw data from a storage location on a first data storage device, the raw data containing sparse data segments and non-sparse data segments, the sparse data segments being empty of any data and the non-sparse data segments containing data;

reading the accessed raw data to locate the sparse data segments and the non-sparse data segments;

generating a formatted data stream comprising one or more formatted atomic blocks based on the read raw data, the one or more formatted atomic blocks having a maximum block size defined by a size of a destination device memory and associated with a metadata file, each one of the one or more formatted atomic blocks configured with a format that includes a field indicating a data size of the non-sparse data segments of the raw data, a field containing at least a portion of the non-sparse data segments, and a field containing the metadata file identifying specific beginning and end locations, in which the formatting of the one or more atomic blocks comprises:

when one of the atomic blocks does not exceed the maximum block size:

populate the metadata file of the one of the atomic blocks with offsets identifying a beginning of each of the located sparse data segments and an end of each of the located sparse data segments; and

populate the one of the atomic blocks with a concatenation of at least a portion of a first non-sparse segment of the raw data located before the sparse segment with at least a portion of a second non-sparse data segment of the raw data located after the sparse data segment; and

when one of the atomic blocks exceeds the maximum size:

populate the metadata file of another one of the atomic blocks with offsets indicative of the beginning of the located sparse data segment and the end of the located sparse data segment; and

populate the other one of the atomic blocks with a concatenation of the at least a portion of the first non-sparse data segment of the raw data located before the sparse data segment with the at least a portion of the second non-sparse data segment of the raw data located after the sparse data segment; and

reconstructing the raw data on a second data storage device from the formatted data stream, the second data storage device being distinct from the first data storage device, the reconstructing being based on the metadata file, the reconstructing comprising:

browsing the metadata file; and

iteratively reconstructing the raw data by writing a corresponding data content for each non-sparse data segment identified by the metadata file and executing a command to define a segment of zeros for each sparse data segment identified by the metadata file, wherein the segment of zeros comprises multiple zeros,

wherein one atomic block is streamed from the first data storage device to the second data storage device while the generating of the formatted data stream and the iterative reconstructing of the raw data are being executed in parallel.