US 12,475,126 B1
Departmental allocation of query processing
Michael G. Coutts, San Diego, CA (US); Douglas P. Brown, Rancho Santa Fe, CA (US); and David Doyle Gilbreath, Escondido, CA (US)
Assigned to Teradata US, Inc., San Diego, CA (US)
Filed by Teradata US, Inc, San Diego, CA (US)
Filed on Sep. 30, 2024, as Appl. No. 18/902,262.
Int. Cl. G06F 15/16 (2006.01); G06F 16/2453 (2019.01)
CPC G06F 16/24545 (2019.01) 20 Claims
OG exemplary drawing
 
1. A method comprising:
executing, by a plurality of compute groups, a cloud based enterprise data warehouse (CBEDW) for an enterprise as a plurality of processes,
wherein at least one of the compute groups is cloud based,
wherein each compute group comprises one or more compute clusters, including a primary compute cluster and a plurality of secondary compute clusters,
wherein each compute cluster comprises one or more nodes,
wherein each node comprises at least one computer processor and a memory,
wherein the plurality of processes stores data on a data storage,
wherein at least a portion of the data storage is cloud based,
wherein the enterprise comprises a plurality of departments,
wherein at least some of the plurality of departments has control of a respective one or more the plurality of secondary compute clusters,
wherein the data storage is segmented into departmental portions and wherein at least some of the plurality of departments that have control of one or more of the plurality of secondary compute clusters has control of a respective departmental portion of the data storage, and
executing, by at least one of the plurality of processes, a database system;
receiving a query by the database system, wherein the query recites one or more query-recited tables, wherein each of the query-recited tables is stored in a respective departmental portion of the data storage;
analyzing the query by the database system to determine the resources required to execute the query by:
identifying the department that issued the query (the “source department”) and identifying the one or more secondary compute clusters controlled by the source department as a resource required to execute the query;
identifying the query-recited tables and determining the department by which each of the query-recited tables is controlled by:
accessing a database-level data dictionary that identifies tables in the database system, which department controls each table, and which departments have access to each table,
identifying the departmental portion of the data storage for each department that controls a query-recited table as a resource required to execute the query, and
identifying the secondary compute clusters controlled by each department that controls a query-recited table as a resource required to execute the query;
determining, by the database system, that the resources required to execute the query are controlled by more than one department: the source department and one or more accessed departments;
for each accessed department:
determining, by the database system, a first-strategy cost of a first strategy for processing the accessed department's portion of the query, wherein in the first strategy the query is re-written to have an accessed-department sub-query to be performed using the resources controlled by the accessed department to produce an accessed-department intermediate result set and a source-department sub-query to be performed using the resources controlled by the source department to incorporate the accessed-department intermediate result into a query result set, and
determining, by the database system, a second-strategy cost of a second strategy for processing the accessed department's portion of the query, wherein in the second strategy the query is executed by the one or more secondary compute clusters controlled by the source department to produce the query result set,
wherein determining the cost includes:
preparing, by the database system, a query execution plan,
determining the cost of the resources, and
considering whether data controlled by the accessed department is not accessible except to the accessed department;
for at least one of the accessed departments, choosing, by the database system, the first strategy because the first-strategy cost is lower than the second-strategy cost, and
for the accessed departments for which the first strategy was chosen, processing a portion of the query to be performed using resources controlled by the accessed department using the chosen first strategy to produce the accessed-department intermediate result set;
processing the query, by the secondary compute cluster controlled by the source department, the query incorporating the one or more accessed-department intermediate result sets from the one or more accessed departments, to produce a result set.