CPC G06F 16/24568 (2019.01) [G06F 16/24528 (2019.01); G06F 16/2456 (2019.01)] | 15 Claims |
1. A system for processing queries on batch and streaming datasets, the system comprising:
a relational data store configured to ingest data from one or more data sources and store data in a first dataset and a second dataset;
a query generator configured to interpret a data expression in a simplified query language to generate a query in a structured query language based on a first quad corresponding to the first dataset and a second quad corresponding to the second dataset identified based on the data expression and determining an implicit join between the first quad and the second quad based on an unambiguous relationship obtainable from a schema of the first dataset and the second dataset, wherein the data expression does not expressly identify a join between the first quad and the second quad; and
a query processor configured to generate a query pipeline that uses the data of the first dataset and the second dataset stored by the relational data store to execute the query generated by the query processor,
wherein the query pipeline includes one or more compute nodes instantiated based on a query plan for a query, wherein the one or more compute nodes include alternating layers of faucets and turbines, and wherein an upstream faucet transmits pointers to the ingested data to one or more downstream turbines.
|