US 12,073,293 B2
	Machine learning model evaluation in cyber defense
Eamon Hirata Jordan, Honolulu, HI (US); Chad Kumao Takahashi, Honolulu, HI (US); and Ryan Susumu Ito, Wahiawa, HI (US)
Assigned to Resurgo, LLC, Honolulu, HI (US)
Filed by Resurgo, LLC, Honolulu, HI (US)
Filed on Jul. 23, 2020, as Appl. No. 16/937,568.
Application 16/937,568 is a division of application No. 15/373,425, filed on Dec. 8, 2016, granted, now 10,733,530.
Prior Publication US 2020/0364620 A1, Nov. 19, 2020
Int. Cl. G06N 20/00 (2019.01); G06F 21/55 (2013.01); G06N 7/01 (2023.01); H04L 9/40 (2022.01); H04L 43/08 (2022.01); H04L 43/16 (2022.01)

CPC G06N 20/00 (2019.01) [G06F 21/552 (2013.01); G06N 7/01 (2023.01); H04L 43/08 (2013.01); H04L 43/16 (2013.01); H04L 63/1425 (2013.01)]

1 Claim

1. A process for improving sensors that use machine learning models previously trained on previous training data, for defending a computer network against cyber attacks in live network traffic that contains both normal network traffic and said cyber attacks, to determine more accurately whether and when to react to intrusion detection alert logs, comprising:

installing said sensors in said computer network;

determining model fit of said models by applying techniques of anomaly detection, measuring similarity between said live network traffic and said previous training data, and determining model overfit;

assigning thresholds, and aggregating results of whether said models are above or below said thresholds, for each of said techniques of anomaly detection, measuring similarity between said live network traffic and said previous training data, and determining model overfit, to identify model fit in real time of said live network traffic with said previous training data on a scale of model fits;

activating model retraining based on said scale of model fits, wherein said model retraining selects an optimal model for distinguishing between said cyber attacks and said normal network traffic; and

reinstalling said optimal model in said sensors, to perform real time evaluation of said live network traffic, including anomaly detection,

wherein said model retraining step is performed by:

obtaining samples of said normal network traffic from said network;

providing samples of said cyber attacks from said network or from a repository of cyber attacks, wherein said samples of cyber attacks constitute known attacks;

sample classifying each of said samples as either said normal network traffic or said known attacks, to create ground truths for said samples;

splitting said samples into a training set and a test set, with each of said sets containing samples of said normal network traffic and said known attacks;

using a model generating algorithm to generate a variety of models for distinguishing between said normal network traffic and said known attacks in said training set;

obfuscating a portion of said known attacks in said training set to create obfuscated attack samples;

adding said obfuscated attack samples to said test set to form an enhanced test set;

performing statistical analysis on performance of said models with said enhanced test set to determine intrusion detection capability error; and

selecting one of said models that optimizes a desired model parameter as said optimal model.