CPC G06F 21/51 (2013.01) [G06F 21/54 (2013.01); G06F 21/563 (2013.01); G06F 21/568 (2013.01); G06N 20/00 (2019.01)] | 11 Claims |
1. An apparatus comprising:
a communication processor to access a log file, the log file including a memory dump of first functions corresponding to a first portable executable file that failed to execute in a sandbox environment;
a log file filtration controller to exclude at least one known clean function from the first functions of the log file to generate a filtered log file;
a log file normalization controller to generate normalized arguments of second functions of the filtered log file, the log file normalization controller to:
identify a first argument of a first function of the second functions of the filtered log file;
determine a data type of the first argument of the first function;
identify a placeholder representative of the data type of the first argument; and
replace the first argument of the first function with the placeholder representative of the data type of the first argument to generate a first normalized argument of the normalized arguments;
a feature vector generation controller to:
generate an empty version of a feature vector; and
after generating the empty version of the feature vector, populate the feature vector with n-gram groupings of the second functions having the normalized arguments; and
a machine learning engine to train, using stochastic gradient descent, a machine learning model with the feature vector, the machine learning model to be deployed to an end-user device to detect malware in a second portable executable file.
|