US 11,720,267 B2
Maintaining a fault-tolerance threshold of a clusterstore during maintenance activities
Alkesh Shah, Sunnyvale, CA (US); Austin Kramer, Redwood City, CA (US); Leonid Livshin, Malden, MA (US); Ramses V. Morales, Sunnyvale, CA (US); and Brian Masao Oki, San Jose, CA (US)
Assigned to VMWARE, INC., Palo Alto, CA (US)
Filed by VMware, Inc., Palo Alto, CA (US)
Filed on Oct. 19, 2021, as Appl. No. 17/504,829.
Prior Publication US 2023/0118169 A1, Apr. 20, 2023
Int. Cl. G06F 3/06 (2006.01)
CPC G06F 3/0634 (2013.01) [G06F 3/0604 (2013.01); G06F 3/067 (2013.01); G06F 3/0619 (2013.01); G06F 3/0631 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for maintaining fault tolerance in a storage cluster, comprising:
receiving, by a management component associated with a distributed data store on a cluster of host machines, a request to place a first host machine of the cluster of host machines in a maintenance mode, wherein the first host machine stores given data of the distributed data store;
after receiving the request, determining, by the management component, whether a second host machine that does not currently store any data of the distributed data store exists in the cluster of host machines;
determining, by the management component, based on whether the second host machine exists in the cluster of host machines, whether to transfer the given data of the distributed data store from the first host machine to the second host machine;
determining, by the management component, a number of failures to tolerate (FTT) of the distributed data store;
performing, based on the number of FTT of the distributed data store, at least one of:
decrementing, by the management component, the number of FTT of the distributed data store by one,
deactivating, by the management component. the distributed data store: or
recreating. by the management component. a state associated with the given data of the distributed data store on the second host machine; and
after determining whether to transfer the given data of the distributed data store from the first host machine to the second host machine, initiating, by the management component, the maintenance mode on the first host machine.