US 12,118,075 B2
Methods and apparatus to improve detection of malware in executable code
Brajesh Kumar, Bangalore (IN); Sumit Lohani, Bangalore (IN); and Sidney Gomindes, Bangalore (IN)
Assigned to McAfee, LLC, San Jose, CA (US)
Filed by McAfee, LLC, Santa Clara, CA (US)
Filed on Jul. 28, 2020, as Appl. No. 16/941,111.
Claims priority of application No. 202011022393 (IN), filed on May 28, 2020.
Prior Publication US 2021/0374229 A1, Dec. 2, 2021
Int. Cl. G06F 21/51 (2013.01); G06F 21/54 (2013.01); G06F 21/56 (2013.01); G06N 20/00 (2019.01)
CPC G06F 21/51 (2013.01) [G06F 21/54 (2013.01); G06F 21/563 (2013.01); G06F 21/568 (2013.01); G06N 20/00 (2019.01)] 11 Claims
OG exemplary drawing
 
1. An apparatus comprising:
a communication processor to access a log file, the log file including a memory dump of first functions corresponding to a first portable executable file that failed to execute in a sandbox environment;
a log file filtration controller to exclude at least one known clean function from the first functions of the log file to generate a filtered log file;
a log file normalization controller to generate normalized arguments of second functions of the filtered log file, the log file normalization controller to:
identify a first argument of a first function of the second functions of the filtered log file;
determine a data type of the first argument of the first function;
identify a placeholder representative of the data type of the first argument; and
replace the first argument of the first function with the placeholder representative of the data type of the first argument to generate a first normalized argument of the normalized arguments;
a feature vector generation controller to:
generate an empty version of a feature vector; and
after generating the empty version of the feature vector, populate the feature vector with n-gram groupings of the second functions having the normalized arguments; and
a machine learning engine to train, using stochastic gradient descent, a machine learning model with the feature vector, the machine learning model to be deployed to an end-user device to detect malware in a second portable executable file.