US 12,461,802 B1
Systems and methods for efficiently identifying failed computing nodes
Sean Elliott Roberts, Mountain View, CA (US); Ran Simha Biron, Santa Clara, CA (US); and Shishir Sharma, Mountain View, CA (US)
Assigned to Egnyte, Inc., Mountain View, CA (US)
Filed by Egnyte, Inc., Mountain View, CA (US)
Filed on Oct. 13, 2023, as Appl. No. 18/486,485.
Claims priority of provisional application 63/416,242, filed on Oct. 14, 2022.
Int. Cl. G06F 11/00 (2006.01); G06F 11/07 (2006.01)
CPC G06F 11/0724 (2013.01) [G06F 11/073 (2013.01); G06F 11/0757 (2013.01)] 49 Claims
OG exemplary drawing
 
1. A method for identifying failed computing nodes, said method comprising:
providing a plurality of computing nodes, each computing node of said plurality of computing nodes having access to at least one hardware processor and to shared data storage;
generating a first unique identifier particularly corresponding to a first particular node of said plurality of computing nodes;
generating a second unique identifier particularly corresponding to a second particular node of said plurality of computing nodes;
storing task information in said shared data storage, said task information indicative of a plurality of computing tasks each available to be completed by one of said plurality of computing nodes;
periodically updating node information in said shared data storage with new information associated with said first unique identifier;
accessing said shared data storage at a first time;
identifying a most recent update to said node information associated with said first unique identifier in said shared data storage;
determining whether said most recent update to said node information associated with said first unique identifier in said shared data storage occurred more than a threshold amount of time prior to said first time;
concluding, when said most recent update to said node information associated with said first unique identifier in said shared data storage occurred more than a threshold amount of time prior to said first time, that a first task identified by said task information as being processed by said first particular node is no longer being processed by said first particular node; and
processing said first task that is concluded to be no longer being processed by said first particular node with a second particular node of said plurality of computing nodes; and wherein
said step of determining that said most recent update to said node record in said shared data storage occurred more than a threshold amount of time prior to said first time is performed by said second node of said plurality of computing nodes.