US 11,709,809 B1
Tree-based approach for transactionally consistent version sets
Bohou Li, Sunnyvale, CA (US); Vijayan Prabhakaran, Los Gatos, CA (US); Mehul A. Shah, Saratoga, CA (US); Benjamin Sowell, San Mateo, CA (US); and Douglas Brian Terry, San Francisco, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Mar. 29, 2021, as Appl. No. 17/216,359.
Int. Cl. G06F 16/00 (2019.01); G06F 16/21 (2019.01); G06F 16/22 (2019.01)
CPC G06F 16/219 (2019.01) [G06F 16/2246 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, at a transactional access manager of a data lake management service, a request to begin an atomic transaction involving a metadata table of a data catalog, wherein the metadata table references data objects stored at a separate object storage service that make up the table;
receiving, at the transactional access manager, one or more commands to update metadata of the table as part of the transaction;
based at least in part on the one or more commands to update metadata of the table as part of the transaction:
generating, based on a first tree data structure storing metadata associated with a first version of the table, a second tree data structure associated with a second version of the table resulting from the one or more commands, wherein the first tree data structure includes one or more nodes that do not exist in the second tree data structure, wherein the second tree data structure includes one or more nodes that do not exist in the first tree data structure, and wherein the second tree data structure references one or more nodes of the first tree data structure; and
updating a version history data structure associated with the table to include a second version node associated with the second version, wherein the second version node references the second tree data structure and further references a first version node associated with the first version;
receiving, at the transactional access manager, a request to commit the transaction;
committing the transaction, comprising at least further updating the version history data structure:
receiving a query to be executed using the table, wherein the query includes or is associated with a time value indicating a point in time of the table that the query is to be executed against;
identifying the second version node within the version history data structure based on use of the time value or a transaction identifier provided with the query; and
executing the query using the second tree data structure referenced by the second version node.