CPC G06F 16/2471 (2019.01) [G06F 16/24526 (2019.01)] | 30 Claims |
1. A computer-implemented method, comprising:
receiving a query in a first query language to be applied to a set of data records;
parsing the query to identify multiple query stages;
generating, for each query stage of the multiple query stages, a sub-query, in the first query language, wherein each sub-query is configured to cause each of multiple worker nodes, to implement the query stage with respect to a subset of the set of data records obtained at the worker node, each sub-query representing a distinct executable query in the first query language that corresponds to a distinct query stage of the multiple query stages;
based on a determination that a first query stage of the multiple query stages corresponds to a first native operation and a determination the multiple worker nodes are configured to execute the first native operation, generating one or more instructions to execute the first native operation;
based on a determination that a second query stage of the multiple query stages does not correspond to any native operation, determining not to generate one or more instructions to execute a native operation;
generating instructions for shuffling records between the multiple worker nodes at a point in time between at least two of the multiple query stages; and
communicating the instructions for shuffling records, the one or more instructions to execute the first native operation, and the sub-query corresponding to the second query stage of the multiple query stages, to the multiple worker nodes for concurrent implementation, wherein each worker node includes a distinct executor for processing sub-queries in the first query languages.
|