CPC G06F 16/24537 (2019.01) [G06F 16/2255 (2019.01)] | 30 Claims |
1. A system comprising:
at least one hardware processor of a first server; and
at least one memory storing instructions that cause the at least one hardware processor to perform operations comprising:
generating at the first server, a plurality of build-side row sets using build-side row data received from a second server;
computing, at the first server, a plurality of hash values for a join operation using the build-side row data, the join operation using the plurality of build-side row sets as build-side input and a plurality of probe-side row sets of probe-side row data originating from a third server as probe-side input, data from the probe-side input being probed to detect a match with data from build-side input during the join operation;
sampling the plurality of hash values associated with the plurality of build-side row sets to detect a frequent hash value, the frequent hash value appearing at least a threshold number of times in the plurality of hash values for the build-side row data;
partitioning a build-side row set of the plurality of build-side row sets to generate a partitioned build-side row set, the build-side row set using the frequent hash value, and the partitioned build-side row set including a plurality of hash partitions of the build-side row data, a number of rows in the build-side row data being smaller than a number of rows in the probe-side row data after the partitioning of the build-side row set;
distributing the plurality of hash partitions to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of additional servers, each server of the plurality of additional servers executing one of the plurality of HJB instances using a hash partition of the plurality of hash partitions; and
outputting a result of the join operation based on completion of execution of the plurality of HJB instances at the plurality of additional servers.
|