US 12,411,620 B2
	Low hiccup time fail-back in active-active dual-node storage systems with large writes
Alexander Shknevsky, Fair Lawn, NJ (US); Oran Baruch, Tel Aviv (IL); Maor Rahamim, Ramla (IL); and Vamsi K. Vankamamidi, Hopkinton, MA (US)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Jun. 1, 2023, as Appl. No. 18/204,444.
Prior Publication US 2024/0402917 A1, Dec. 5, 2024
Int. Cl. G06F 3/06 (2006.01)

CPC G06F 3/0622 (2013.01) [G06F 3/0655 (2013.01); G06F 3/0679 (2013.01)]

20 Claims

1. A method of limiting or reducing storage accessibility hiccups in an active-active clustered system that performs user data chunk write operations, the active-active clustered system including a first storage node and a second storage node, the method comprising:

executing, by the first storage node of the active-active clustered system, a specialized recovery protocol for data and/or metadata associated with the user data chunk write operations, the first storage node being an active node, the specialized recovery protocol comprising, in response to the second storage node of the active-active clustered system transitioning from being an active node to an inactive node:

treating, by the first storage node, each large write request from among one or more large write requests from a host computer as a plurality of small write requests, the large write request corresponding to a request to write a respective user data chunk containing a plurality of data elements, each small write request corresponding to a request to write a respective data element from among the plurality of data elements, the first storage node and the second storage node having their own dedicated sub-ubers into which one or more user data chunks associated with one or more large write requests have been stored or ingested;

draining, by the first storage node, one or more dedicated sub-ubers associated with the first storage node;

returning, by the first storage node, the one or more dedicated sub-ubers associated with the first storage node to a computerized sub-uber manager;

draining, by the first storage node, one or more dedicated sub-ubers associated with the second storage node; and

returning, by the first storage node, the one or more dedicated sub-ubers associated with the second storage node to the computerized sub-uber manager,

whereby the first storage node takes responsibility for its own dedicated sub-ubers and those of the second storage node with regard to draining and returning them to the computerized sub-uber manager; and

having completed execution of the specialized recovery protocol, resuming, by the first storage node, normal treatment of large write requests from the host computer.