CPC G06F 3/0619 (2013.01) [G06F 3/064 (2013.01); G06F 3/067 (2013.01)] | 20 Claims |
1. A system, comprising:
a distributed storage architecture including distributed storage comprised of storage devices managed by a plurality of nodes including a first node and a second node;
the first node hosting a first slice service that maintains a primary slice file populated with mappings between logical block addresses of logical blocks of a storage container within the distributed storage to block identifiers used to identify blocks within the storage devices storing data of the logical blocks, wherein the first slice service replicates changes made to the primary slice file to a first replica slice file maintained by a second slice service of the second node as a replica of the primary slice file, and wherein a second block of the first replica slice file is maintained as a replica of a first block of the primary slice file; and
a distributed metadata layer hosted across the plurality of nodes, wherein the distributed metadata layer includes the first slice service and the second slice service, and wherein the distributed metadata layer:
detects a storage device error affecting the first block storing a first block identifier of the primary slice file;
in response to detecting that the first replica slice file is a first dead replica slice file that is out of sync with the primary slice file, compares a first checksum of the first block with a second checksum of the second block storing a second block identifier of the first dead replica slice file to determine whether the first checksum and the second checksum match; and
in response to the first checksum and the second checksum matching, overwrites the first block with the second block to repair the primary slice file.
|