US 12,216,670 B2
Monitoring data usage to optimize storage placement and access using content-based datasets
Adam Brenner, Mission Viejo, CA (US); Jehuda Shemer, Kfar Saba (IL); Steven Sadhwani, Round Rock, TX (US); Valerie Lotosh, Ramat-Gan (IL); and Erez Sharvit, Ramat-Gan (IL)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Oct. 27, 2022, as Appl. No. 17/975,368.
Prior Publication US 2024/0143610 A1, May 2, 2024
Int. Cl. G06F 16/2458 (2019.01); G06F 11/34 (2006.01); G06F 16/2455 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/2462 (2019.01) [G06F 11/3409 (2013.01); G06F 16/24564 (2019.01); G06F 16/285 (2019.01)] 16 Claims
OG exemplary drawing
 
1. A computer-implemented method of optimizing storage placement and access to content data in a data processing system using content-based datasets, comprising:
executing a processor-based search engine to scan content data stored in a database to identify metadata associated with data elements to be monitored on the basis of monitoring attributes including usage and access patterns;
gathering the identified metadata for storage in a data catalog;
executing, by the search engine, a user entered query against the metadata in the data catalog to automatically generate a dataset comprising metadata selectors as tags for matching the identified metadata;
defining rules based on the monitoring attributes, wherein a rule dictates a storage location of data or access permissions to the data by one or more persons or groups in the system and a subjectively scaled characterization of the data as important in a current project by the one or more persons;
applying a protection policy to the dataset to protect the content data, wherein the dataset spans multiple storage devices of different storage types, and wherein the dataset defines a single data access unit for the referenced data elements from the database, and further wherein the monitoring attributes are applied to the referenced data elements as a single unit based on data content rather than data location, and yet further wherein the content data comprises archive data classified into multiple datasets, each having different protection policies;
recursively processing the archive data to attach corresponding tags to each dataset;
merging policies of the corresponding tags based on a most restrictive policy of the different protection policies for the archive data to define the protection policy; and
executing a processor-based backup process using the protection policy to store the archive data in storage media.