CPC G06F 16/90335 (2019.01) [G06F 16/24537 (2019.01); G06F 16/24542 (2019.01); G06F 16/24544 (2019.01)] | 20 Claims |
1. A computer-implemented method for executing a database query for performing join of data stored in a plurality of input tables, the computer-implemented method comprising:
receiving, by a database system executing on a cluster of servers, the database query specifying a join of a scan table with a hash table;
generating an execution plan for executing the database query, the execution plan including instructions for a plurality of join strategies;
allocating a set of threads for executing operators of the execution plan, each thread processing one or more operators, the set of threads comprising at least a first thread and a second thread, wherein the first thread communicates with the second thread via a blocking channel comprising a buffer of fixed capacity for storing data; and
processing the database query by streaming data through the operators of the execution plan, the processing comprising:
determining a size of the hash table,
selecting a join strategy from the plurality of join strategies based on the size of the hash table,
executing the selected join strategy comprising, communicating data by the first thread via the blocking channel to the second thread to distribute one of the scan table or the hash table by performing a distributed shuffle, and
performing, by a server, a join of the scan table with a part of the hash table mapped to the server.
|