US 12,339,962 B2
Methods and apparatus for detection of malicious documents using machine learning
Joshua Daniel Saxe, Wichita, KS (US); Ethan M. Rudd, Colorado Springs, CO (US); and Richard Harang, Alexandria, VA (US)
Assigned to Sophos Limited, Abingdon (GB)
Filed by Sophos Limited, Abingdon (GB)
Filed on Oct. 10, 2023, as Appl. No. 18/483,795.
Application 18/483,795 is a continuation of application No. 17/314,625, filed on May 7, 2021, granted, now 11,822,374.
Application 17/314,625 is a continuation of application No. 16/257,749, filed on Jan. 25, 2019, granted, now 11,003,774, issued on May 11, 2021.
Claims priority of provisional application 62/622,440, filed on Jan. 26, 2018.
Prior Publication US 2024/0119150 A1, Apr. 11, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. H04L 29/06 (2006.01); G06F 18/214 (2023.01); G06F 18/24 (2023.01); G06F 21/56 (2013.01); G06N 3/04 (2023.01); G06N 3/045 (2023.01); G06N 5/01 (2023.01); G06N 20/20 (2019.01)
CPC G06F 21/56 (2013.01) [G06F 18/214 (2023.01); G06F 18/24 (2023.01); G06F 21/562 (2013.01); G06F 21/563 (2013.01); G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06N 5/01 (2023.01); G06N 20/20 (2019.01)] 19 Claims
OG exemplary drawing
 
1. An apparatus for detection of malicious documents using machine learning, comprising:
a memory; and
a processor operatively coupled to the memory, the processor configured to:
receive a plurality of files that are potentially malicious, each file from the plurality of files being associated with a file type and a label indicating maliciousness;
define a first feature vector, the first feature vector associated with a first file from the plurality of files, the first file being associated with a first file type and a first label;
define a second feature vector, the second feature vector associated with a second file from the plurality of files, the second file being associated with a second file type different than the first file type, and a second label;
concatenate the first feature vector with the second feature vector to define a concatenated feature vector;
send the concatenated feature vector to train a first machine learning model to produce a maliciousness classification associated with the first file type;
send the concatenated feature vector to train a second machine learning model different than the first machine learning model to produce a maliciousness classification associated with the second file type different than the first file type;
provide a third feature vector to the first machine learning model to obtain a maliciousness classification associated with a third file indicating the third file as malware; and
perform a remedial action on the third file based on the maliciousness classification associated with the third file.