| CPC G06F 16/24553 (2019.01) [G06F 16/24542 (2019.01)] | 8 Claims |

|
1. A cardinality estimation method for Skyline query based on deep learning, comprising:
obtaining the Skyline query on a target dataset and a corresponding cardinality by screening and parsing from historical query log information of a database;
constructing a training set based on the Skyline query and the corresponding cardinality;
constructing and training respective data distribution learning models according to distribution information of the target dataset and the training set;
constructing a cardinality estimation model, and using model parameters of trained data distribution learning models as initialization parameters of the cardinality estimation model to train the cardinality estimation model through the training set; and
inputting query points to obtain final cardinality estimates according to a trained cardinality estimation model,
wherein the step of obtaining the Skyline query on a target dataset and a corresponding cardinality by screening and parsing from historical query log information of a database comprises:
parsing and screening out the Skyline query on the target dataset and the corresponding cardinality from a query statement of the historical query log information, wherein the Skyline query comprises the query points and query parameters; and
parsing a scale of a query result set, that is, a query cardinality from query results of the Skyline query,
wherein constructing and training of the cardinality estimation model comprises:
1) constructing a cardinality estimation sub model MQ based on the query points, wherein the cardinality estimation sub model MQ has a first deep neural network composed of a Transformer, a pooling layer and a linear connection layer, and an initial parameter is a parameter of a trained data distribution learning model on the target dataset;
2) constructing a cardinality estimation sub model MP based on the query parameters, wherein the cardinality estimation sub model MP has a second deep neural network composed of a Transformer, a pooling layer and a linear connection layer, and an initial parameter is a parameter of a trained data distribution learning model on the training set; and
3) encoding the query points and inputting encoded data into the first deep neural network for training to obtain the cardinality estimation sub model MQ, and splicing and encoding the query points and the query parameters and inputting encoded data into the second deep neural network for training to obtain the cardinality estimation sub model MP based on the query parameters, wherein the cardinality estimation sub model MP based on the query parameters always keeps the model parameters positive during training, and a goal of model optimization is to minimize an error between an weighted average of outputs of the two cardinality estimation sub models MP, MQ and a true cardinality value.
|