US 12,292,872 B2
Compaction of documents in a high density data storage system
Sarath Lakshman, Mahe (IN); Apaar Gupta, Bangalore (IN); Rohan Ashok Suri, Jr., Mumbai (IN); Scott David Lashley, Portland, OR (US); John Sae Liang, Palo Alto, CA (US); Srinath Duvuru, Portland, OR (US); and David James Oliver Rigby, Manchester (GB)
Assigned to Couchbase, Inc., Santa Clara, CA (US)
Filed by Couchbase, Inc., Santa Clara, CA (US)
Filed on Jul. 18, 2023, as Appl. No. 18/223,541.
Claims priority of application No. 202241041486 (IN), filed on Jul. 20, 2022.
Prior Publication US 2024/0028596 A1, Jan. 25, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/22 (2019.01); G06F 12/02 (2006.01); G06F 16/23 (2019.01); G06F 16/2455 (2019.01); G06F 16/93 (2019.01)
CPC G06F 16/2246 (2019.01) [G06F 12/0253 (2013.01); G06F 16/2358 (2019.01); G06F 16/24552 (2019.01); G06F 16/24561 (2019.01); G06F 16/93 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for maintaining data in a data management system, the computer-implemented method comprising:
storing a set of documents in log-structured object store comprising sequence numbers and document value, wherein the log-structured object store maintains documents sequence numbers and document values, the log-structured object store comprising a plurality of log segments;
storing a first log-structured merge-tree mapping keys to sequence numbers for accessing documents of the set of documents;
maintaining a delete list using a second log-structured merge-tree, the delete list comprising a list of stale document sequence numbers and corresponding sizes per log segment;
responsive to receiving a request to delete a document associated with a key,
identifying a sequence number of the deleted document from the first log-structured merge-tree based on the key value;
retrieving a size of the deleted document based on metadata of the deleted document stored in the log-structured object store based on the sequence number; and
recording the sequence number and the size of the deleted document in the second log-structured merge-tree;
for each log segment from the plurality of log segments, determining a measure of fragmentation of the log segment based on sizes of deleted documents of the log segment from the second log-structured merge-tree; and
responsive to the fragmentation exceeding a threshold, initiating a compaction operation for the log segment.