| CPC G06F 16/113 (2019.01) [G06F 16/119 (2019.01); G06F 16/285 (2019.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 5/025 (2013.01); G06N 20/00 (2019.01)] | 11 Claims |

|
1. A computer-implemented method for data archiving using machine learning, comprising:
receiving statistical information related to user access of a plurality of files stored in a cold storage device;
generating a sequence-to-sequence model that is configured to determine, for an initial sequence of prior file requests, a resulting sequence of subsequent file requests that have a greatest likelihood of being received, wherein the sequence-to-sequence model determines the resulting sequence by identifying patterns of access in the statistical information, and wherein a first pattern of access comprises an execution of an application compatible with a given file type and a subsequent request to access a file of the given file type;
executing, by a hardware processor, the sequence-to-sequence model on an input sequence comprising a first ordered set of file requests that includes an execution of the application compatible with the given file type;
receiving, from the sequence-to-sequence model, an output sequence comprising a second ordered set of file requests for at least one of the plurality of files having the given file type;
modifying a threshold value indicative of whether to store the at least one of the plurality of files in a hot storage device responsive to the resulting sequence of subsequent file requests that have the greatest likelihood of being received, wherein the hot storage device has quicker data retrieval than the cold storage device,
wherein the threshold value is a file group specific threshold value calculated for a particular group of files from the plurality of files responsive to the resulting sequence of subsequent file requests that have the greatest likelihood of being received;
wherein the particular group of files is determined based on having at least one same file type from among a plurality of file types; and
migrating the at least one of the plurality of files from the cold storage device to the hot storage device based on the modified threshold value.
|