CPC G06F 16/285 (2019.01) [G06F 16/116 (2019.01)] | 18 Claims |
1. A method for clustering executable files, the method comprising:
obtaining, by the computer device, a plurality of executable files;
determining a file format of each executable file of the plurality of executable files;
for each file format:
(i) detecting, by the computer device, repeat sequences of commands of a predetermined length in a given executable file of the plurality of executable files,
a given command being represented by a portion of one of a source and machine code associated with the given executable file that includes an action to be executed by the given executable file;
(ii) determining, by the computer device, at least one frequently occurring sequence of the repeat sequences of commands in the given one of the plurality of executable files; and
based on the at least one frequently occurring sequence of commands, attributing the given executable file to a respective family;
iteratively executing the detecting, the determining, and the attributing until one of: all of the plurality of executable files are attributed to at least one respective family, and until un-attributed files of the plurality of executable files do not contain any repeat sequences of commands; and
in response to presence of un-attributed files of the plurality of executable files, attributing each of the un-attributed files of the plurality of executable files to a separate family.
|