| CPC G06F 11/0724 (2013.01) [G06F 11/073 (2013.01); G06F 11/0757 (2013.01)] | 49 Claims |

|
1. A method for identifying failed computing nodes, said method comprising:
providing a plurality of computing nodes, each computing node of said plurality of computing nodes having access to at least one hardware processor and to shared data storage;
generating a first unique identifier particularly corresponding to a first particular node of said plurality of computing nodes;
generating a second unique identifier particularly corresponding to a second particular node of said plurality of computing nodes;
storing task information in said shared data storage, said task information indicative of a plurality of computing tasks each available to be completed by one of said plurality of computing nodes;
periodically updating node information in said shared data storage with new information associated with said first unique identifier;
accessing said shared data storage at a first time;
identifying a most recent update to said node information associated with said first unique identifier in said shared data storage;
determining whether said most recent update to said node information associated with said first unique identifier in said shared data storage occurred more than a threshold amount of time prior to said first time;
concluding, when said most recent update to said node information associated with said first unique identifier in said shared data storage occurred more than a threshold amount of time prior to said first time, that a first task identified by said task information as being processed by said first particular node is no longer being processed by said first particular node; and
processing said first task that is concluded to be no longer being processed by said first particular node with a second particular node of said plurality of computing nodes; and wherein
said step of determining that said most recent update to said node record in said shared data storage occurred more than a threshold amount of time prior to said first time is performed by said second node of said plurality of computing nodes.
|