| CPC G06F 16/2462 (2019.01) [G06F 11/3409 (2013.01); G06F 16/24564 (2019.01); G06F 16/285 (2019.01)] | 16 Claims | 

| 
               1. A computer-implemented method of optimizing storage placement and access to content data in a data processing system using content-based datasets, comprising: 
            executing a processor-based search engine to scan content data stored in a database to identify metadata associated with data elements to be monitored on the basis of monitoring attributes including usage and access patterns; 
                gathering the identified metadata for storage in a data catalog; 
                executing, by the search engine, a user entered query against the metadata in the data catalog to automatically generate a dataset comprising metadata selectors as tags for matching the identified metadata; 
                defining rules based on the monitoring attributes, wherein a rule dictates a storage location of data or access permissions to the data by one or more persons or groups in the system and a subjectively scaled characterization of the data as important in a current project by the one or more persons; 
                applying a protection policy to the dataset to protect the content data, wherein the dataset spans multiple storage devices of different storage types, and wherein the dataset defines a single data access unit for the referenced data elements from the database, and further wherein the monitoring attributes are applied to the referenced data elements as a single unit based on data content rather than data location, and yet further wherein the content data comprises archive data classified into multiple datasets, each having different protection policies; 
                recursively processing the archive data to attach corresponding tags to each dataset; 
                merging policies of the corresponding tags based on a most restrictive policy of the different protection policies for the archive data to define the protection policy; and 
                executing a processor-based backup process using the protection policy to store the archive data in storage media. 
               |