CPC H04L 63/1433 (2013.01) [G06F 11/3072 (2013.01); G06F 18/23 (2023.01); G06N 3/04 (2013.01); H04L 63/1416 (2013.01); H04L 63/1425 (2013.01)] | 11 Claims |
1. A method for analyzing and clustering darknet traffic streams with word embeddings, comprising:
a) collecting data from blackhole taps of said darknet, being unassigned IP addresses;
b) splitting the collected data into sliding time windows, each having a predetermined length;
c) For each time window, grouping the destination port (D-port) records of the same source IP (S-IP) into a port sequence, to obtain a plurality of port sequences;
d) transforming said port sequences into a numerical feature vectors by applying a word embedding algorithm to said port sequences, by treating ports as words and port sequences as sentences;
e) clustering said feature vectors over time by performing temporal clustering; and
f) upon identifying clusters that have been appeared and classified as malicious in the past or clusters that have never seen before, issuing an alert.
|