US 12,475,121 B2
	Hybrid cost model for evaluating query execution plans
Seyed Mohammad Amin Kamali, Orleans (CA); Vincent Corvinelli, Mississauga (CA); Calisto Zuzarte, Pickering (CA); Brandon Lewis Frendo, Markham (CA); Vasiliki Kantere, Ottawa (CA); and Ning Wang, Ottawa (CA)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Nov. 13, 2023, as Appl. No. 18/507,299.
Prior Publication US 2025/0156417 A1, May 15, 2025
Int. Cl. G06F 16/2453 (2019.01)

CPC G06F 16/24542 (2019.01)

14 Claims

1. A computer-implemented method for evaluating query execution plans, the method comprising:

inputting, to a plurality of base cost models comprising a plurality of learned cost models and a classic cost model, a query and a search space comprising a plurality of candidate query execution plans, wherein the plurality of learned cost models are trained to estimate, for an input comprising a query and one or more plans, an execution time for each plan, wherein each of the plurality of learned cost models is separately trained to predict plan execution times for a unique query class of a plurality of query classes, wherein each of the plurality of query classes is defined based on at least one of: a level of complexity, a coverage of a database schema, data statistics, configuration, application, and workload characteristics, or a combination thereof, and wherein the classic cost model is configured to estimate an execution cost based on cardinalities of an input query plan's operators;

outputting, from each base cost model of the plurality of base cost models, a predicted execution time or cost for each plan of the plurality of candidate query execution plans;

determining a real execution time for each plan of the plurality of candidate query execution plans;

generating a training label comprising the query and a model of the base cost models having a highest correlation between the predicted execution time and the real execution time for the respective query; and

training a query classifier on training data comprising the training label to predict which base cost model of the plurality of base cost models is a most suitable cost model for planning a given query.