US 11,695,794 B2
	Method and system for clustering darknet traffic streams with word embeddings
Dvir Cohen, Efrat (IL); Asaf Shabtai, Hulda (IL); Yuval Elovici, Arugot (IL); Yisroel Avraham Mirsky, Beer Sheva (IL); Rami Puzis, Ashdod (IL); Tobias Martin, Darmstadt (DE); and Manuel Kamp, Darmstadt (DE)
Assigned to DEUTSCHE TELEKOM AG, Bonn (DE)
Filed by DEUTSCHE TELEKOM AG, Bonn (DE)
Filed on Apr. 2, 2020, as Appl. No. 16/838,136.
Claims priority of provisional application 62/828,528, filed on Apr. 3, 2019.
Prior Publication US 2020/0322368 A1, Oct. 8, 2020
Int. Cl. G06F 11/30 (2006.01); H04L 9/40 (2022.01); G06N 3/04 (2023.01); G06F 18/23 (2023.01)

CPC H04L 63/1433 (2013.01) [G06F 11/3072 (2013.01); G06F 18/23 (2023.01); G06N 3/04 (2013.01); H04L 63/1416 (2013.01); H04L 63/1425 (2013.01)]

11 Claims

1. A method for analyzing and clustering darknet traffic streams with word embeddings, comprising:

a) collecting data from blackhole taps of said darknet, being unassigned IP addresses;

b) splitting the collected data into sliding time windows, each having a predetermined length;

c) For each time window, grouping the destination port (D-port) records of the same source IP (S-IP) into a port sequence, to obtain a plurality of port sequences;

d) transforming said port sequences into a numerical feature vectors by applying a word embedding algorithm to said port sequences, by treating ports as words and port sequences as sentences;

e) clustering said feature vectors over time by performing temporal clustering; and

f) upon identifying clusters that have been appeared and classified as malicious in the past or clusters that have never seen before, issuing an alert.