US 12,086,098 B2
File tiering to different storage classes of a cloud storage provider
Rabi Shankar Shaw, Bangalore (IN); Anurag Bhatnagar, Bangalore (IN); Joyanto Biswas, Bangalore (IN); and Akshay Jagirdar, Bengaluru (IN)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Aug. 18, 2021, as Appl. No. 17/405,707.
Prior Publication US 2023/0058908 A1, Feb. 23, 2023
Int. Cl. G06F 15/16 (2006.01); G06F 16/11 (2019.01); G06F 16/16 (2019.01); G06F 16/17 (2019.01); G06F 16/182 (2019.01)
CPC G06F 16/113 (2019.01) [G06F 16/119 (2019.01); G06F 16/164 (2019.01); G06F 16/1734 (2019.01); G06F 16/1827 (2019.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus comprising:
at least one processing platform comprising a plurality of processing devices each comprising a processor coupled to a memory;
said at least one processing platform being configured:
to receive an input specifying one or more rules for archiving a plurality of files from a source storage location to a target storage location, wherein the target storage location comprises a cloud storage platform comprising a plurality of storage classes representing respective storage levels in the cloud storage platform;
wherein the plurality of storage classes comprise at least: (i) a first storage class corresponding to a first file retrieval speed and a first cost per data unit; and (ii) a second storage class corresponding to a second file retrieval speed greater than the first file retrieval speed and a second cost per data unit greater than the first cost per data unit;
to retrieve one or more of the plurality of files from the source storage location for migration to the target storage location based at least in part on the one or more rules, wherein the one or more rules include a first criteria specifying a first last file access time threshold for identifying which of the plurality of files to retrieve from the source storage location; and
to control assignment of the one or more of the plurality of files to respective ones of the plurality of storage classes based at least in part on the one or more rules;
wherein the one or more rules specify one or more constraints for the assignment of the one or more of the plurality of the files to the respective ones of the plurality of storage classes, the one or more constraints comprising at least a second criteria for determining respective subsets of the one or more of the plurality of files to assign to the respective ones of the plurality of storage classes, the second criteria specifying: (i) a second last file access time threshold; and (ii) one or more attribute changes to one or more i-node fields of the one or more of the plurality of files, the one or more attribute changes comprising a file ownership change and a link count change for the one or more of the plurality of files;
wherein, in controlling the assignment of the one or more of the plurality of files to the respective ones of the plurality of storage classes, said at least one processing platform is configured:
to determine whether a given file of the one or more of the plurality of files has a last file access time greater than the second last file access time threshold and includes the one or more attribute changes; and
to write one of: (i) a first object corresponding to the given file in the first storage class on the cloud storage platform if the last file access time of the given file is determined to be greater than the second last file access time threshold, the given file is determined to include the one or more attribute changes and the given file belongs to a first one of the respective subsets; and (ii) a second object corresponding to the given file in the second storage class on the cloud storage platform if the last file access time of the given file is determined to be less than the second last file access time threshold, the given file is determined to include the one or more attribute changes and the given file belongs to a second one of the respective subsets; and
wherein said at least one processing platform is further configured:
to display progress of the assignment of the one or more of the plurality of files to the respective ones of the plurality of storage classes, wherein the display of the progress of the assignment comprises: (i) details of progress of retrieval of at least one timestamp associated with the last file access time of the given file; (ii) an indication of a temporal relationship between the last file access time of the given file and a time for execution of the archiving; and (iii) details of whether the given file was tiered to one of the first storage class and the second storage class; and
to generate an interface for a user to define the one or more rules, wherein the interface comprises a plurality of editable fields for the user to input task parameters comprising: (i) a path for the source storage location; (ii) a path for the target storage location; (iii) the one or more constraints; and (iv) one or more protocols to use for reading the plurality of files and for generating a plurality of stub files;
wherein at least one configuration file is implemented in the at least one processing platform and is accessible by a job scheduler via an application programming interface to control a mode of operation of the at least one processing platform.