US 12,135,621 B1
Data relevance-based data retention in data lakehouses
Binoy Thomas, Kozhikode (IN); Sudheesh S. Kairali, Kozhikode (IN); and Sarbajit K. Rakshit, Kolkata (IN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Sep. 25, 2023, as Appl. No. 18/473,527.
Int. Cl. G06F 11/14 (2006.01); G06F 16/21 (2019.01)
CPC G06F 11/1469 (2013.01) [G06F 16/217 (2019.01); G06F 2201/80 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method for data relevancy-based data retention, the computer-implemented method comprising:
performing, by a computer, using a machine learning model, an analysis of a data retention relevancy value assigned to data;
determining, by the computer, using the machine learning model, whether the data retention relevancy value of the data needs to be adjusted based on the analysis;
adjusting, by the computer, using the machine learning model, the data retention relevancy value of the data in accordance with the analysis in response to the computer, using the machine learning model, determining that the data retention relevancy value of the data does need to be adjusted based on the analysis;
assigning, by the computer, the data to a logical data retention relevancy compartment of a plurality of logical data retention relevancy compartments in a data lakehouse;
determining, by the computer, whether the data retention relevancy value of the data is greater than a data retention threshold level of a data retention policy corresponding to the logical data retention relevancy compartment storing the data; and
transferring, by the computer, the data from a physical data storage unit corresponding to the logical data retention relevancy compartment of the plurality of logical data retention relevancy compartments in the data lakehouse to an archive in response to the computer determining that the data retention relevancy value of the data is greater than the data retention threshold level of the data retention policy corresponding to the logical data retention relevancy compartment storing the data.