CPC G06F 16/1752 (2019.01) [G06F 3/067 (2013.01); G06F 3/0608 (2013.01); G06F 3/0641 (2013.01)] | 8 Claims |
1. A system for identifying a file type comprising:
a computing device comprising a processor, a memory, and a non-volatile data storage device;
a file-print extractor comprising a first plurality of programming instructions stored in the memory and operable on the processor, wherein the first plurality of programming instructions, when operating on the processor, causes the computing device to:
parse the contents of a file into groups of bits or bytes; and
generate a file-print for the file, the file-print comprising:
an actual mean value of the number of occurrences of each group in the file; and
a standard deviation of the actual mean value of the number of occurrences from an expected mean value of the number of occurrences; and
a file classifier comprising a second plurality of programming instructions stored in the memory and operable on the processor, wherein the second plurality of programming instructions, when operating on the processor, causes the processor to process the file-print through a trained machine learning classifier to identify a file type of the file.
|