US 12,147,412 B2
Concurrent optimistic transactions for tables with deletion vectors
Bart Samwel, Oegstgeest (NL); and Christos Stavrakakis, Berlin (DE)
Assigned to Databricks, Inc., San Francisco, CA (US)
Filed by Databricks, Inc., San Francisco, CA (US)
Filed on Jan. 18, 2023, as Appl. No. 18/156,109.
Claims priority of application No. 20230100021 (GR), filed on Jan. 13, 2023.
Prior Publication US 2024/0241877 A1, Jul. 18, 2024
Int. Cl. G06F 16/00 (2019.01); G06F 16/23 (2019.01)
CPC G06F 16/2315 (2019.01) [G06F 16/2358 (2019.01); G06F 16/2379 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
accessing a plurality of records included in a first version of a data table, the data table including a first set of data files, each data file being associated with a deletion vector comprising a plurality of elements, each value of an element in the deletion vector indicating whether a corresponding record in the data file has been deleted;
receiving a first indication that a first transaction is committed to update a first subset of records in the data table at the first version, resulting in a second version of the data table with at least a first deletion vector;
receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version of the data table, the first transaction and the second transaction being concurrent transactions;
determining whether a logical prerequisite of the concurrent transactions is satisfied based on whether the first subset of records changes content of one or more records in the second subset of records, wherein the determining comprises:
identifying whether values of elements in the first deletion vector that correspond to the second subset of records indicates that the content of the second subset of records is not changed in the first transaction;
determining whether a physical prerequisite is satisfied based on whether the second subset of records can be located in data files of the second version of the data table; and
committing the second transaction to generate a third version of the data table by updating elements of the deletion vector for the data file that corresponds to the second subset of records, responsive to determining that the logical prerequisite and the physical prerequisite are satisfied.