| CPC G06F 21/56 (2013.01) [G06F 18/214 (2023.01); G06F 18/24 (2023.01); G06F 21/562 (2013.01); G06F 21/563 (2013.01); G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06N 5/01 (2023.01); G06N 20/20 (2019.01)] | 19 Claims |

|
1. An apparatus for detection of malicious documents using machine learning, comprising:
a memory; and
a processor operatively coupled to the memory, the processor configured to:
receive a plurality of files that are potentially malicious, each file from the plurality of files being associated with a file type and a label indicating maliciousness;
define a first feature vector, the first feature vector associated with a first file from the plurality of files, the first file being associated with a first file type and a first label;
define a second feature vector, the second feature vector associated with a second file from the plurality of files, the second file being associated with a second file type different than the first file type, and a second label;
concatenate the first feature vector with the second feature vector to define a concatenated feature vector;
send the concatenated feature vector to train a first machine learning model to produce a maliciousness classification associated with the first file type;
send the concatenated feature vector to train a second machine learning model different than the first machine learning model to produce a maliciousness classification associated with the second file type different than the first file type;
provide a third feature vector to the first machine learning model to obtain a maliciousness classification associated with a third file indicating the third file as malware; and
perform a remedial action on the third file based on the maliciousness classification associated with the third file.
|