CPC G06F 16/273 (2019.01) [A61F 5/566 (2013.01); G06F 9/4881 (2013.01); G06F 9/5016 (2013.01); G06F 9/5044 (2013.01); G06F 9/5083 (2013.01); G06F 9/5088 (2013.01); G06F 16/148 (2019.01); G06F 16/1827 (2019.01); G06F 16/211 (2019.01); G06F 16/221 (2019.01); G06F 16/2365 (2019.01); G06F 16/2456 (2019.01); G06F 16/2471 (2019.01); G06F 16/24532 (2019.01); G06F 16/24545 (2019.01); G06F 16/24552 (2019.01); G06F 16/254 (2019.01); G06F 16/27 (2019.01); G06F 16/283 (2019.01); G06F 16/951 (2019.01); G06F 16/9535 (2019.01); G06F 16/9538 (2019.01); H04L 67/1095 (2013.01); H04L 67/1097 (2013.01); H04L 67/568 (2022.05)] | 20 Claims |
1. A method, comprising:
receiving a relational join query comprising a join operation, an indication of a first relation and a second relation to be joined, and a predicate, wherein the first relation and the second relation are partitioned over processing nodes of a cluster;
determining, by a processing device prior to starting distribution of the first or second relation to a plurality of probe operators of a probe operation, whether to distribute the first relation to the probe operation using a broadcast join or to distribute the second relation to the probe operation using a re-partitioning join, wherein the determining is based at least in part on an estimated size of the second relation and a cost metric;
based on the determining, distributing the first relation or the second relation to the processing nodes of the cluster associated with the probe operation; and
performing, at the processing nodes of the cluster associated with the probe operation, the relational join query using at least one of a hash join, a sort-merge join, or a nested-loop join to generate a third relation that contains all combinations of tuples in the first relation and the second relation that satisfy the predicate.
|