| CPC G06F 16/24542 (2019.01) [G06F 11/3409 (2013.01); G06F 16/2282 (2019.01); G06F 16/278 (2019.01)] | 25 Claims |

|
1. A method comprising:
receiving a query to be executed over a plurality of data records, wherein the plurality of data records comprises a first set of data records stored in a first database distributed across a plurality of partitions and a second set of data records stored in a second database, wherein the second database comprises a different database architecture from the first database;
generating an input vector based at least in part on values for a plurality of features based on the query, the first database, and the second database;
processing the input vector using a machine learning (ML) model to predict a cost for executing the query if one or more data records in the second set of data records are loaded to a first partition of the plurality of partitions of the first database, wherein the ML model is trained to predict the cost of the query, comprising:
generating one or more output query costs by using a plurality of training features as inputs; and
adjusting one or more parameters of the ML model to reduce a difference between the one or more output query costs and one or more historical query costs; and
selecting a plan for loading the one or more data records from the second set of data records to the first partition based on the predicted cost for executing the query.
|