US 12,346,355 B2
Materialized column creation method and data query method based on data lake
Jun Guo, Beijing (CN); Youjun Zhang, Beijing (CN); Yi Xu, Beijing (CN); and Xuan Luo, Beijing (CN)
Assigned to Beijing Volcano Engine Technology Co., Ltd., Beijing (CN)
Filed by Beijing Volcano Engine Technology Co., Ltd., Beijing (CN)
Filed on Jun. 12, 2024, as Appl. No. 18/740,895.
Application 18/740,895 is a continuation of application No. PCT/CN2023/088334, filed on Apr. 14, 2023.
Claims priority of application No. 202210603047.1 (CN), filed on May 30, 2022.
Prior Publication US 2024/0330340 A1, Oct. 3, 2024
Int. Cl. G06F 16/334 (2025.01)
CPC G06F 16/334 (2019.01) 4 Claims
OG exemplary drawing
 
1. A method of querying data based on a data lake, comprising:
acquiring a first data query request triggered by a third user, wherein the first data query request is for requesting data query on a third data table;
rewriting, in accordance with a determination that at least one third materialized column exists in the third data table, the first data query request using materialized column description information of the at least one third materialized column to obtain a second data query request; and
performing data query on the third data table according to the second data query request;
wherein the third materialized column is created by:
acquiring a materialized column creation request triggered by a first user, wherein the materialized column creation request is for requesting to create the third materialized column in the third data table, the materialized column creation request carries a materialized expression of the third materialized column, and the materialized expression is for describing a data association relationship between the third materialized column and a target column in the third data table;
creating the third materialized column according to the materialized column creation request, wherein the third materialized column is used to replace the target column in providing data for a data query request carrying the materialized expression;
acquiring materialized column creation device description information; and
determining a creation device to be used based on the materialized column creation device description information;
wherein creating the third materialized column according to the materialized column creation request comprises:
determining a first creation task from the materialized column creation request;
translating the first creation task according to a task description language of the creation device to be used, to obtain a second creation task, wherein the second creation task is for implementing the creation of the third materialized column; and
sending the second creation task to the creation device to be used, to cause the creation device to be used to execute the second creation task;
wherein the first data query request comprises data query object description information, and the materialized column description information comprises the materialized expression and a materialized column identifier;
wherein rewriting the first data query request using the materialized column description information of the at least one third materialized column to obtain the second data query request comprises:
rewriting, in accordance with a determination that the materialized expression of at least one materialized column to be used of the at least one third materialized column matches at least one content to be used in the data query object description information, the first data query request using the materialized column identifier of at least one materialized column to be used to obtain the second data query request;
wherein the first data query request carries storage space query scope description information;
wherein the method further comprises:
determining at least one candidate materialized column set of the third data table based on the storage space query scope description information;
determining at least one intersection materialized column based on an intersection between the at least one candidate materialized column set; and
selecting at least one target materialized column from the at least one materialized column to be used using the at least one intersection materialized column; and
wherein rewriting the first data query request using the materialized column identifier of at least one materialized column to be used to obtain the second data query request comprises:
rewriting the first data query request using the materialized column identifier of the at least one target materialized column to obtain the second data query request.