US 12,130,834 B1
Distributed appending of transactions in data lakes
Dimiter Dimitriev, Sofia (BG); Kostadin Georgiev, Sofia (BG); Abhishek Gupta, San Jose, CA (US); Christos Karamanolis, Los Gatos, CA (US); and Richard P. Spillane, Palo Alto, CA (US)
Assigned to VMware LLC, Palo Alto, CA (US)
Filed by VMware LLC, Palo Alto, CA (US)
Filed on Jan. 25, 2023, as Appl. No. 18/159,673.
Int. Cl. G06F 16/20 (2019.01); G06F 3/06 (2006.01); G06F 16/23 (2019.01); G06F 16/27 (2019.01)
CPC G06F 16/27 (2019.01) [G06F 3/0604 (2013.01); G06F 3/0643 (2013.01); G06F 3/067 (2013.01); G06F 16/2379 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, at a first ingestion node of a plurality of ingestion nodes, as part of a transaction, a first message, the first message identifying a transaction identifier (ID), a first count of messages for the transaction, and a portion of data for the transaction;
persisting the data of the first message in temporary storage;
determining a second count of messages for the transaction for the first ingestion node;
based on at least the second count of messages, determining that the first ingestion node has received a complete set of messages for the transaction for the first ingestion node; and
transmitting, by the first ingestion node, to a coordinator, a metadata write request, the metadata write request identifying the transaction ID, the first count of messages, and the second count of messages, and including a self-describing reference to persisted data of the set of messages for the transaction for the first ingestion node, wherein the self-describing reference identifies the first ingestion node, location information of the persisted data, and a range of the first data.