US 11,960,603 B2
Multi-step approach for ransomware detection
Adwait Bhave, Pune (IN); Hemanshu Asolia, Pune (IN); and Neeraj Thakur, Pune (IN)
Assigned to Druva Inc., Santa Clara, CA (US)
Filed by Druva Inc., Santa Clara, CA (US)
Filed on Apr. 24, 2018, as Appl. No. 15/961,230.
Claims priority of application No. 201741014571 (IN), filed on Apr. 25, 2017.
Prior Publication US 2018/0307839 A1, Oct. 25, 2018
Int. Cl. H04L 9/00 (2022.01); G06F 21/55 (2013.01); G06F 21/56 (2013.01)
CPC G06F 21/566 (2013.01) [G06F 21/552 (2013.01); G06F 21/56 (2013.01); G06F 2221/034 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A server manager, comprising:
a server interface configured to:
retrieve, from a storage device, a plurality of backups stored by a client device, the plurality of backups correspond to images of the client device in a plurality of backup cycles; and
transmit information describing whether one or more of the backups include a ransomware; and
a ransomware detection module communicatively coupled to the server interface and comprising memory configured to store computer code that comprises instructions, wherein the instructions, when executed by one or more processors, cause the one or more processors to:
generate a standard pattern of file activities of the client device based on a plurality of prior backups that are prior to a particular backup, the standard pattern being specific to the client device and generated by analyzing the plurality of prior backups, wherein generating the standard pattern specific to the client device comprises:
in a learning phase specific to the client device:
defining a number of bins, each bin covering a range of a ratio of file activities, wherein the ratio of file activities corresponds to a total number of file activities in a portion of a backup relative to a total number of files in the portion;
 generating a distribution of file activities in the plurality of prior backups, wherein generating the distribution comprises classifying each of a plurality of portions of the prior backups into one of the bins based on the range of the ratio into which the portion falls; and
 determining a datapoint count in each bin in the distribution;
perform a statistical behavior analysis on the particular backup based on the standard pattern identified, the statistical behavior analysis identifying a particular portion of the particular backup corresponding to a statistical anomaly different from the standard pattern, the particular portion of the particular backup comprising a plurality of files, wherein the statistical behavior analysis comprises:
in a prediction phase that is performed based on the distribution specific to the client device generated in the learning phase:
 determining a particular ratio of a particular number of file activities in the particular portion relative to a total number of the plurality of files in the particular portion;
classifying the particular portion to a particular bin in the distribution based on the particular ratio;
determining an anomaly score corresponding to the particular portion, wherein the anomaly score is determined based at least on (1) the datapoint count of the particular bin to which the particular portion is classified and (2) the datapoint counts of two bins in the distribution that are adjacent to the particular bin;
identify, for the portion of the particular backup corresponding to the statistical anomaly, one or more files that are new or modified;
generate, for each of the one or more identified files that is new or modified, an entropy score, the entropy score representing a randomness of a distribution of bits in each of the identified files;
determine, for each of the one or more identified files whose entropy score exceeds a threshold, whether header information of the identified file has been changed by comparing the header information to corresponding header information in one of the prior backups;
determine, for each of the one or more identified files that is new or modified, the identified file is encrypted if the identified file has the entropy score that exceeds the threshold and the header information of the identified file has been changed; and
perform, for each of the identified files that is encrypted, a journal walk to restore an unencrypted version of the identified file.