| CPC G06F 21/566 (2013.01) [G06F 21/568 (2013.01)] | 20 Claims |

|
1. A system comprising:
one or more computing devices comprising one or more hardware processors and further comprising computer memory that comprises programming instructions that, as executed by the one or more computing devices, cause the system to:
determine whether a second data file is encrypted, based on an analysis of a first backup copy and a second backup copy,
wherein the second data file, which is stored at a primary data storage, is a later version of a first data file,
wherein the first backup copy was generated from the first data file at a first time,
wherein the second backup copy was generated from the second data file at a second time after the first time; and
wherein to perform the analysis, the system is configured to:
restore the first backup copy as a first restored data file at a second data storage that is distinct from the primary data storage,
restore the second backup copy as a second restored data file at the second data storage,
based on determining that both of: (a) a measure of similarity between the first restored data file and the second restored data file is greater than a similarity threshold value, wherein the measure of similarity is based on applying a SimHash algorithm to the first restored data file, and (b) an entropy difference between the first restored data file and the second restored data file is greater than an entropy threshold value, wherein the entropy difference is based on applying a Shannon entropy algorithm to the first restored data file:
(i) cause the second data file and the second backup copy to be disqualified from storage operations in the system, wherein the first backup copy and the second backup copy were generated by at least one of the one or more computing devices of the system, and
(ii) cause the first backup copy to be restored as the first data file at the primary data storage and to replace the second data file at the primary data storage, and
based on determining that both of: (A) the measure of similarity between the first restored data file and the second restored data file is less than the similarity threshold value, and (B) the entropy difference between the first restored data file and the second restored data file is less than the entropy threshold value: cause the second data file and the second backup copy to be marked as usable for storage operations in the system.
|