| CPC G06F 21/561 (2013.01) [G06N 3/0442 (2023.01); G06F 2221/034 (2013.01)] | 24 Claims |

|
1. A method for generating a feature vector for malware, the method comprising:
storing, in memory of a computing device, program code for a trained neural network that produces embedded representations for antivirus scan data;
executing, by a processor of the computing device, the program code for the trained neural network, the neural network causing the computing device to be configured to perform the operations of:
(a) receiving an antivirus scan report (AVSR) for a malware file, the AVSR having a label including plural tokens that identify an antivirus product and attributes of the malware file;
(b) normalizing each label in the AVSR by separating each label into a sequence of tokens including a set of token strings;
(c) generating an input sequence for the malware file by embedding a first token and plural second tokens from the AVSR, wherein the first token identifies a start of the input sequence and each second token corresponds to the AVSR of the malware file;
(d) inputting the input sequence into a neural model for producing antivirus scan data; and
(e) outputting the antivirus scan data produced by the neural model as one or more feature vectors.
|