US 12,001,296 B1
Continuous lock-minimal checkpointing and recovery with a distributed log-based datastore
Abhiram Kumar Hare Ram Singh, Billerica, MA (US); Theodore Allen Carroll, Seattle, WA (US); Nathaniel Vaughan Langman, Manchester by the Sea, MA (US); and Michael Anthony Sciscenti, Laurel, MD (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Sep. 30, 2022, as Appl. No. 17/937,013.
Int. Cl. G06F 11/14 (2006.01)
CPC G06F 11/1464 (2013.01) [G06F 11/1451 (2013.01); G06F 11/1469 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
adding a node to a cluster of nodes utilizing a log-based datastore based on a determination to scale out the cluster or a determination that another node that was or is in the cluster has experienced a failure or performance issue;
updating a local state of the node to be current, wherein the local state is derived based on a sequence of updates to the log-based data store, the updating comprising:
obtaining, by the node, checkpointed data from a durable datastore, wherein the checkpointed data corresponds to the local state;
updating, by the node, a local data structure based on the obtained checkpointed data;
attaching, by the node, to the log-based datastore at a specific location identified based on the checkpointed data; and
replaying one or more updates, obtained from the log-based datastore via the attachment, to further update the local data structure to be current; and
determining, by the node based on a leader election process, that the node is to act as a designated writer node of the cluster that controls the updating of the log-based datastore based on use of the local state; and
switching, by the node, into a designated writer mode of operation.