US 11,658,989 B1
Method and device for identifying unknown traffic data based dynamic network environment
Zhaoyun Ding, Hunan (CN); Hang Zhang, Hunan (CN); Deqi Cao, Hunan (CN); Weike Liu, Hunan (CN); Yi Liu, Hunan (CN); Xianqiang Zhu, Hunan (CN); Cheng Zhu, Hunan (CN); Yun Zhou, Hunan (CN); Songping Huang, Hunan (CN); and Bin Liu, Hunan (CN)
Assigned to National University of Defense Technology, Changsha (CN)
Filed by National University of Defense Technology, Hunan (CN)
Filed on Sep. 27, 2022, as Appl. No. 17/953,665.
Claims priority of application No. 202210036819.8 (CN), filed on Jan. 13, 2022.
Int. Cl. H04L 9/40 (2022.01); G06N 5/022 (2023.01)
CPC H04L 63/1416 (2013.01) [G06N 5/022 (2013.01)] 9 Claims
OG exemplary drawing
 
1. A method for identifying unknown traffic data based on a dynamic network environment, comprising:
acquiring a network traffic data set to be identified, the network traffic data set including a plurality of known traffic data and/or unknown traffic data, both the known traffic data and the unknown traffic data containing normal traffic data and malicious traffic data;
pre-processing the network traffic data set based on traffic feature ordering to obtain traffic data features in multiple dimensions;
inputting the traffic data features in the multiple dimensions into a known network traffic classification model to predict a class of respective traffic data in the network traffic data set, and outputting respective class prediction result;
performing preliminary determination for unknown traffic data on traffic data corresponding to the class prediction result according to a self-adaptive confidence principle to obtain the unknown traffic data; classifying the unknown traffic data into different classes according to an adaptive clustering method, and initially labeling the unknown traffic data according to the divided classes;
identifying a class of the unknown traffic data according to a similarity coefficient estimation method to obtain classes for malicious traffic and normal traffic in the unknown traffic data, wherein in identifying the class of the unknown traffic data according to the similarity coefficient estimation method to obtain classes for malicious traffic and normal traffic in the unknown traffic data, similarity of the traffic data is estimated by calculating K-L divergence indexes of two types of traffic data features; and
training and updating the known network traffic classification model with malicious traffic data and normal traffic data identified in the unknown traffic data as known traffic data, wherein the known network traffic classification model is trained and constantly updated with new known traffic data such that the known network traffic classification model learns and is trained with emerging new network data;
the known network traffic classification model is configured to continuously improve in ability to identify unknown traffic data;
the known network traffic classification model comprises a deep neural network, the deep neural network comprises a convolutional neural network (CNN), the CNN supports learning of data features in different dimensions in a process of training the known network traffic classification model; an integrated classification model for known network traffic Me˜Semre is used to perform prediction on the network traffic data set to be identified, wherein Me˜Seme comprises three parts which respectively use different training feature sets, the three parts comprise a traffic classification subnetwork model Mn using only known normal traffic F Hn={1d Hn, 2d Hn, 3d H} where 1d H, 2d H˜, and 3d H, are known normal training features, a traffic classification subnetwork model Mp using only known malicious traffic F Hp={1d Hp, 2d H, 3d Hr} where 1d Hp, 2d Hp, and 3d Hp are known malicious training features, and a traffic classification subnetwork model M, using both the known normal traffic and the known malicious traffic as training features;
each of the three parts comprises CNN models in three dimensions, and sample probabilities of respective Softmax layers of the three CNN models are fused by a decision information fusion layer;
an optimal feature pre-processing strategy selected in a process of training the integrated classification model for the known traffic is used to pre-process the traffic data to obtain one-dimensional features, two-dimensional features and three-dimensional features, and the known normal traffic classification model Mn, the known malicious traffic classification model Mp and the known traffic classification model M, are used to perform prediction on the one-dimensional features, the two-dimensional features and the three-dimensional features respectively;
based on characteristics of the deep neural network classification model with high prediction confidence for trained samples and low prediction confidence for unknown samples, the known network traffic classification model is used to perform prediction on the network traffic data set; and an adaptive confidence threshold e is set, and unknown network traffic data with low confidence of the sample prediction class results is screened out from mixed network packets, thus preliminary determination for the unknown traffic data is made for corresponding traffic data according to the class prediction result, and thereby identifying malicious traffic from the unknown traffic data with enhanced accuracy.