CPC H04L 63/1416 (2013.01) [G06N 20/00 (2019.01); H04L 63/145 (2013.01); H04L 63/1425 (2013.01)] | 20 Claims |
1. A non-transitory computer-readable storage medium having computer-readable code stored thereon for programming one or more processors to perform steps of:
obtaining file identifiers associated with files in production data, wherein the production data is live, real traffic that includes both benign and malicious data;
obtaining lab data from one or more public repositories of malware samples based on similarities of the file identifiers obtained from the live, real production data and the lab data;
deriving weights based on similarities of the production data and the obtained lab data; and
utilizing the derived weights and the obtained lab data as training data for training a machine learning process for classifying malware in the production data, wherein the training includes utilizing the derived weights for improving a recall of the machine learning process.
|