US 12,430,215 B2
	Use of cluster-level redundancy within a cluster of a distributed storage management system to address node-level errors
Wei Sun, Boulder, CO (US); Anil Paul Thoppil, Pleasanton, CA (US); and Anne Marie Vasu, Erie, CO (US)
Assigned to NetApp, Inc., San Jose, CA (US)
Filed by NetApp, Inc., San Jose, CA (US)
Filed on May 1, 2024, as Appl. No. 18/652,325.
Application 18/652,325 is a continuation of application No. 17/680,621, filed on Feb. 25, 2022, granted, now 11,983,080.
Claims priority of provisional application 63/279,892, filed on Nov. 16, 2021.
Prior Publication US 2024/0289240 A1, Aug. 29, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 11/10 (2006.01); G06F 3/06 (2006.01); G06F 11/16 (2006.01); G06F 11/30 (2006.01); G06F 16/27 (2019.01)

CPC G06F 11/1662 (2013.01) [G06F 3/0622 (2013.01); G06F 3/064 (2013.01); G06F 3/0679 (2013.01); G06F 11/1088 (2013.01); G06F 11/3034 (2013.01); G06F 16/27 (2019.01)]

19 Claims

1. A non-transitory machine readable medium storing instructions, which when executed by one or more processing resources of a distributed storage system comprising a cluster of a plurality of nodes, cause the distributed storage system to:

implement a redundancy scheme for data blocks stored by the distributed storage system by maintaining at least one redundant copy of a given data block of a first node of the plurality of nodes by at least one other node of the plurality of nodes;

based on encountering a number of block read errors associated with a particular Redundant Array of Independent Disks (RAID) stripe of a RAID group associated with the first node in which the number of block read errors meets or exceeds a predetermined or configurable threshold, identify the particular RAID stripe as a failed RAID stripe; and

restore a plurality of data blocks that were stored within the failed RAID stripe without performing RAID recovery/reconstruction by making use of the redundancy scheme.