US 12,014,056 B2
Slice file recovery using dead replica slice files
Bharadwaj Vellore Ramesh, Sunnyvale, CA (US); Daniel McCarthy, Erie, CO (US); Christopher Lee Cason, Boulder, CO (US); and Ananthan Subramanian, San Ramon, CA (US)
Assigned to NetApp, Inc., San Jose, CA (US)
Filed by NetApp Inc., San Jose, CA (US)
Filed on Aug. 23, 2022, as Appl. No. 17/893,511.
Prior Publication US 2024/0069743 A1, Feb. 29, 2024
Int. Cl. G06F 3/06 (2006.01)
CPC G06F 3/0619 (2013.01) [G06F 3/064 (2013.01); G06F 3/067 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
a distributed storage architecture including distributed storage comprised of storage devices managed by a plurality of nodes including a first node and a second node;
the first node hosting a first slice service that maintains a primary slice file populated with mappings between logical block addresses of logical blocks of a storage container within the distributed storage to block identifiers used to identify blocks within the storage devices storing data of the logical blocks, wherein the first slice service replicates changes made to the primary slice file to a first replica slice file maintained by a second slice service of the second node as a replica of the primary slice file, and wherein a second block of the first replica slice file is maintained as a replica of a first block of the primary slice file; and
a distributed metadata layer hosted across the plurality of nodes, wherein the distributed metadata layer includes the first slice service and the second slice service, and wherein the distributed metadata layer:
detects a storage device error affecting the first block storing a first block identifier of the primary slice file;
in response to detecting that the first replica slice file is a first dead replica slice file that is out of sync with the primary slice file, compares a first checksum of the first block with a second checksum of the second block storing a second block identifier of the first dead replica slice file to determine whether the first checksum and the second checksum match; and
in response to the first checksum and the second checksum matching, overwrites the first block with the second block to repair the primary slice file.