| CPC G06F 21/6218 (2013.01) [G06F 16/125 (2019.01); G06F 2221/2101 (2013.01); G06F 2221/2141 (2013.01)] | 20 Claims |

|
1. A computer-implemented method of managing a lifecycle of data processed through a plurality of stages in a system using content-based datasets, comprising:
identifying data objects of disparate file formats that are subject to same control rules in each stage of the lifecycle as grouped data, wherein the control rules provide access only to authorized users or perform only authorized operations on the grouped data based on a current stage of the lifecycle, and further wherein the data objects are protected by different data protection policies utilizing the control rules;
generating a dataset for the grouped data by scanning the data objects to identify metadata of the grouped data to be processed similarly within the lifecycle, and storing the identified metadata in the dataset, wherein the lifecycle includes a backup operation implementing the data protection policies;
iteratively processing the dataset to tag the data objects according to a native file format;
attaching multiple tags to the dataset to indicate that the data objects of the dataset are of different file types according to the disparate file formats;
merging the protection policies to back up the dataset under a merged protection policy;
associating the control rules to the grouped data as stage tags for the dataset;
monitoring actions performed on and by the data objects referenced by the dataset in each stage of the lifecycle; and
ensuring that the monitored actions comply with control rules using the stage tags of the dataset.
|