US 12,088,611 B1
Systems and methods for training a machine learning model to detect beaconing communications
Cui Lin, Los Altos, CA (US); and Stanislav Miskovic, San Jose (CA)
Assigned to Splunk Inc., San Francisco, CA (US)
Filed by SPLUNK Inc., San Francisco, CA (US)
Filed on Jan. 11, 2022, as Appl. No. 17/573,399.
Int. Cl. H04L 9/40 (2022.01); G06F 18/214 (2023.01); G06N 20/00 (2019.01)
CPC H04L 63/1425 (2013.01) [G06F 18/214 (2023.01); G06N 20/00 (2019.01); H04L 63/1416 (2013.01); H04L 63/1466 (2013.01); H04L 63/166 (2013.01); H04L 63/20 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computerized method comprising:
accessing an initial set of historical network traffic data from a data store, wherein the historical network traffic data represents transmission of data between source devices and destination devices;
preparing a training set of data prior to training a machine learning model, from the initial set of data, by:
applying a plurality of operations to the initial set of historical network traffic data to obtain a plurality of filtered subsets of network transmissions, wherein each filtered subset of network transmissions represents a corresponding set of beaconing candidates and is labeled by at least a security expert or a machine learning model to form a plurality of sets of labeled results,
wherein the plurality of sets of labeled results are augmented to form an augmented labeled training set, and
storing the augmented labeled training set;
applying a first clustering filter rule to the initial set of historical network traffic data to obtain a first filtered subset of network transmissions that represent a first set of beaconing candidates;
performing a clustering logic to generate a set of one or more clusters from the first set of beaconing candidates;
applying a multivariate anomaly detection logic to the set of one or more clusters to detect and extract outliers in the first set of beaconing candidates;
providing an outlier alert to a system administrator indicating that the outliers have been determined to indicate a presence of beaconing, wherein extraction of the outliers results in a remaining set of beaconing candidates and a sampling subset from each cluster of the remaining set of beaconing candidates is labeled by the security expert to form a first set of labeled results; and
training the machine learning model using the augmented labeled training set, the machine learning model being subsequently used to classify data.