CPC G06F 21/566 (2013.01) [G06F 21/53 (2013.01); G06F 2221/033 (2013.01)] | 8 Claims |
1. A computer-implementable method for training an ensemble of classifiers to determine malware families of malware, the method comprising:
receiving a given sample of training malware of a plurality of samples of training malware;
analyzing the given sample of training malware in an isolated environment;
generating a respective behavioral report including indications of actions executed by the given sample of training malware in the isolated environment;
identifying within the respective behavioral reports associated with each one of the plurality of samples of training malware, a report group of behavioral reports associated with the samples of training malware of a given malware family;
determining by analyzing actions in the report group associated with the given malware family, reference actions common to every sample of training malware in the given malware family;
generating for a given behavioral report of the report group, a respective training feature vector of a respective plurality of training feature vectors associated with the given malware family, generating a given value of the respective training feature vector comprises:
determining whether a receptive field of the given behavioral report corresponds to a respective reference action associated with the given malware family;
training a given classifier of the ensemble of classifiers, based on the respective plurality of training feature vectors to determine if a given in-use sample of malware is of the given malware family; and
using the ensemble of classifiers to identify the given in-use sample of malware by:
receiving the given in-use sample of malware;
analyzing the given in-use sample of malware in the isolated environment;
generating an in-use behavioral report including indications of actions executed by the given in-use sample of malware;
generating a given in-use feature vector associated with the given in-use sample of malware,
a given value of the in-use feature vector being generated based on data in a given field of the in-use behavioral report which corresponds to a respective reference action associated with a respective malware family;
feeding the given in-use feature vector to a respective classifier the ensemble of classifiers associated with the respective malware family to generate a prediction outcome indicative of whether the given in-use sample of malware is of the respective malware family or not; and
storing data of the prediction outcome in association with the given in-use sample of malware for further use in the identifying the malware.
|