US 12,436,934 B2
Method and device for processing database tasks, hot and cold data
Teng Zhang, Hangzhou (CN); Jian Tan, Sunnyvale, CA (US); Jianying Wang, Hangzhou (CN); and Feifei Li, Hangzhou (CN)
Assigned to Alibaba Cloud Computing Co., Ltd., Alibaba (China) Co., Ltd., Hangzhou (CN)
Filed by Alibaba Cloud Computing Co., Ltd., Zhejiang (CN); and Alibaba (China) Co., Ltd., Zhejiang (CN)
Filed on Feb. 22, 2024, as Appl. No. 18/584,837.
Application 18/584,837 is a continuation of application No. PCT/CN2022/112559, filed on Aug. 15, 2022.
Claims priority of application No. 202110968775.8 (CN), filed on Aug. 23, 2021.
Prior Publication US 2024/0248886 A1, Jul. 25, 2024
Int. Cl. G06F 16/22 (2019.01); G06F 16/2455 (2019.01)
CPC G06F 16/2282 (2019.01) [G06F 16/2246 (2019.01); G06F 16/24552 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method for processing database tasks, applicable to a database storage engine integrated with a machine learning model configured to process a hot-and-cold data identification task in a data layering application scenario, and the database storage engine comprises K layers of persistent storage medium in the data layering application scenario, and the method comprising:
monitoring a usage rate of the persistent storage medium for any layer of persistent storage medium in a previous M layers of the persistent storage medium;
calling, in response to the usage rate of the persistent storage medium having reached a set usage rate threshold, the machine learning model to perform the hot-and-cold data identification task, and triggering a subsequent action based on a task execution result output by the machine learning model;
wherein the triggering the subsequent action based on the task execution result output by the machine learning model comprises: compressing cold data in the persistent storage medium and merging into next layer of the persistent storage medium based on the hot-and-cold data identification result, and prefetching hot data in the persistent storage medium to a memory of the database storage engine, wherein K is a positive integers greater than or equal to 2, and M is a positive integer less than K; and
determining, in response to the usage rate of the persistent storage medium having not reached the set usage rate threshold and a set model dynamic updating condition being met, target resource information and target sample data available for current model updating based on current load information of the database storage engine, and starting a background task to perform online updating on the machine learning model based on the target resource information and the target sample data.