US 12,353,418 B2
Handling null values in processing join operations during query execution
Samuel Peter Bove, Chicago, IL (US); Ellis Mihalko Saupe, University City, MO (US); Jason Arnold, Chicago, IL (US); and Andrew Park, St. Charles, IL (US)
Assigned to Ocient Holdings LLC, Chicago, IL (US)
Filed by Ocient Holdings LLC, Chicago, IL (US)
Filed on May 31, 2023, as Appl. No. 18/326,305.
Claims priority of provisional application 63/367,270, filed on Jun. 29, 2022.
Prior Publication US 2024/0004882 A1, Jan. 4, 2024
Int. Cl. G06F 16/2455 (2019.01); G06F 16/2453 (2019.01)
CPC G06F 16/2456 (2019.01) [G06F 16/24544 (2019.01)] 19 Claims
OG exemplary drawing
 
1. A method comprising:
determining a query that includes a join expression for execution against a database system;
dispersing a set of input rows for processing via a plurality of parallelized join processes in conjunction with executing the join expression based on:
identifying a first proper subset of the set of input rows based on a null-handling strategy;
dispersing first rows in the first proper subset for processing across the plurality of parallelized join processes in accordance with the null-handling strategy; and
dispersing second rows in a set difference between the set of input rows and the first proper subset across the plurality of parallelized join processes in accordance with a join key-based assignment strategy, wherein the second rows are dispersed differently from the first rows based on the join key-based assignment strategy being different from the null-handling strategy;
processing the set of input rows via the plurality of parallelized join processes, wherein each of the plurality of parallelized join processes receives and processes a corresponding subset of the set of input rows based on the dispersing of the set of input rows;
determining a second query that includes a second join expression for execution against the database system;
determining not to utilize the null-handing strategy for the second query;
dispersing a second set of input rows for processing via a second plurality of parallelized join processes in conjunction with executing the second join expression based on dispersing all of the second set of input rows across the second plurality of parallelized join processes in accordance with the join key-based assignment strategy based on determining not to utilize the null-handing strategy for the second query; and
processing the second set of input rows via the second plurality of parallelized join processes, wherein each of the second plurality of parallelized join processes receives and processes a corresponding subset of the second set of input rows based on the dispersing of the second set of input rows.