US 12,298,977 B1
Dynamic selection of database data topologies for performing queries
Fabian Oliver Nagel, Mountain View, CA (US); Chunbin Lin, Dublin, CA (US); Ippokratis Pandis, Menlo Park, CA (US); Naresh Chainani, Mountain View, CA (US); Mengchu Cai, San Jose, CA (US); Krishna Chaitanya Gudipati, Fremont, CA (US); Sanjay Wangoo, San Ramon, CA (US); Qichen Zhang, Newark, CA (US); and Nikolaos Armenatzoglou, New York, NY (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 30, 2022, as Appl. No. 17/810,293.
Int. Cl. G06F 15/16 (2006.01); G06F 16/2453 (2019.01)
CPC G06F 16/24545 (2019.01) 17 Claims
OG exemplary drawing
 
1. A system, comprising:
a processing cluster for a database, wherein the processing cluster comprises a plurality of computing nodes, and wherein the processing cluster is configured to:
receive a query directed to two or more tables of the database, wherein a first topology comprises a first distribution of physical data partitions amongst a first number of computing nodes of respective node types, wherein a second topology comprises a second distribution of the physical data amongst a second number of computing nodes of respective node types, wherein the first topology is different from the second topology, and wherein the first topology and the second topology are usable to make different assignments for processing different respective portions of data of the database to computing resources of the plurality of computing nodes, wherein one of the two or more tables is a shared table stored at another processing cluster, wherein the second topology is a foreign topology for the shared table is obtained from a leader node of the other processing cluster;
generate different possible query plans to perform the query, wherein a first query plan of the different possible query plans assigns use of one of the first topology or the second topology differently than a second query plan of the different possible query plans;
determine that the first query plan is a lowest cost query plan according to a cost-based model that assigns costs to different uses of the first topology and the second topology;
execute the first query plan to perform the query, wherein to execute the first query plan to perform the query, the processing cluster is configured to obtain database data for the shared table from the other cluster or from a data store accessible to the other cluster.