US 12,437,068 B2
Automatic threat detection of executable files based on static data analysis
Mauritius Schmidtler, Escondido, CA (US); Gaurav Dalal, San Jose, CA (US); and Reza Yoosoofmiya, San Diego, CA (US)
Assigned to OPEN TEXT INC., Menlo Park, CA (US)
Filed by Open Text Inc., Menlo Park, CA (US)
Filed on Apr. 19, 2022, as Appl. No. 17/724,419.
Application 17/724,419 is a continuation of application No. 16/791,649, filed on Feb. 14, 2020, granted, now 11,409,869.
Application 16/791,649 is a continuation of application No. 14/709,875, filed on May 12, 2015, granted, now 10,599,844, issued on Mar. 24, 2020.
Prior Publication US 2022/0237293 A1, Jul. 28, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 21/56 (2013.01); G06F 8/53 (2018.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01)
CPC G06F 21/565 (2013.01) [G06F 21/562 (2013.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01); G06F 8/53 (2013.01); G06F 2221/033 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
identifying, by a knowledge module, static data points that may be indicative of either a harmful or benign executable file;
associating, by the knowledge module, the identified static data points with one of a plurality of categories of files, the plurality of categories of files including harmful files and benign files; identifying an executable file to be evaluated;
extracting, by the knowledge module, a plurality of static data points from the identified executable file;
generating a feature vector from the plurality of static data points using a classifier trained to classify the static data points based on training data, the training data comprising files known to fit into one of the plurality of categories of files, wherein one or more features of the feature vector generated using the classifier are selectively turned on or off, wherein the one or more features are selectively turned on or off based one or more values of the static data points being within a predetermined range; and
providing the generated feature vector to a support vector machine to build a probabilistic model that indicates whether the executable file fits into one of the categories of files, the generated feature vector comprising at least one feature that has been selectively turned off.