US 12,242,448 B1
Columnar storage format for fast table updates
Austin Lee, Burbank, CA (US); and Vikram Jiandani, Los Angeles, CA (US)
Assigned to Rapid7, Inc., Boston, MA (US)
Filed by Rapid7, Inc., Boston, MA (US)
Filed on Feb. 10, 2022, as Appl. No. 17/668,440.
Int. Cl. G06F 16/21 (2019.01); G06F 16/22 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/219 (2019.01) [G06F 16/211 (2019.01); G06F 16/221 (2019.01); G06F 16/288 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
one or more computing systems having one or more hardware processors and associated hardware memory that implements a database system, configured to:
store a table in a columnar format, wherein (a) individual columns of the table are stored as respective column files, (b) each column file groups values into a plurality of entity chunks based on an entity identifier, and (c) each column file includes an entity index that indicates respective locations of the entity chunks in the column file;
receive an update request to update one or more rows in the table, and in response:
determine a column file to be updated by the update request;
determine an entity chunk at a first location in the column file to be updated based on the entity index;
create an updated version of the entity chunk that replaces one or more old values with one or more new values according to the update request;
append the updated version of the entity chunk to the column file at a second location in the column file; and
update the entity index in the column file, wherein the update causes the entity chunk at the first location to become an obsolete version of the entity chunk in the column file and the updated version of the entity chunk at the second location to be a live version of the entity chunk in the column file; and
perform a compaction process to replace the column file with a new column file of a smaller size, wherein the compaction process includes to:
scan the entity index to identify one or more deleted or obsolete versions of one or more entity chunks in the column file; and
generate the new column file so that (a) the one or more deleted or obsolete versions of one or more entity chunks are removed in the new column file and (b) the entity index is updated in the new column file to indicate new locations of remaining entity chunks in the new column file.