US 12,292,971 B2
Cybersecurity system evaluation and configuration
Richard Edward Harang, Alexandria, VA (US)
Assigned to Sophos Limited, Abingdon (GB)
Filed by Sophos Limited, Abingdon (GB)
Filed on Nov. 12, 2021, as Appl. No. 17/524,930.
Claims priority of provisional application 63/113,462, filed on Nov. 13, 2020.
Prior Publication US 2022/0156372 A1, May 19, 2022
Int. Cl. G06F 21/56 (2013.01); G06N 20/00 (2019.01); H04L 9/40 (2022.01)
CPC G06F 21/56 (2013.01) [G06N 20/00 (2019.01); H04L 63/20 (2013.01); G06F 2221/034 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer program product comprising computer executable code embodied in a non-transitory computer readable medium that, when executing on one or more computing devices, performs the steps of:
evaluating a true positive rate and a false positive rate for a malware detection system, the true positive rate corresponding to an accurate detection of malware by the malware detection system in a base data set and the false positive rate corresponding to an erroneous detection of malware in the base data set by the malware detection system, the base data set labeled with a known composition of malicious code instances and the base data set having a base rate of malware instances;
applying the malware detection system to a new data set to determine a first number of detections within the new data set;
generating a number of synthetic data sets with an estimation engine based on a distribution of malware instances within the base data set;
selecting a representative group from the number of synthetic data sets that produce a corresponding set of numbers of detection when analyzed with the malware detection system similar to the first number of detections produced within the new data set when analyzed with the malware detection system, wherein the corresponding set of numbers of detection are each within a predetermined threshold of the first number of detections, and wherein the predetermined threshold is a relative threshold scaled according to a ratio of a size of the new data set to the size of each of the synthetic data sets; and
determining a malware detection metric for the new data set based on a statistical composition of the representative group.