US 12,218,957 B2
	Volumetric clustering on large-scale DNS data
Pengxiang Xu, Huntington Beach, CA (US); Vaisakhi Mishra, White Plains, NY (US); Annamaria Balazs, Austin, TX (US); and Cheng-Ta Lee, Chamblee, GA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Mar. 21, 2022, as Appl. No. 17/655,604.
Prior Publication US 2023/0300151 A1, Sep. 21, 2023
Int. Cl. H04L 9/40 (2022.01); G06N 5/022 (2023.01)

CPC H04L 63/1416 (2013.01) [G06N 5/022 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving, by one or more processors, a first set of domains, wherein the first set of domains comprises a set of normal domains, a set of suspicious domains, and a set of malicious domains;

labelling, by the one or more processors, each domain of the set of normal domains as normal to produce a labelled set of normal domains and each domain of the set of suspicious domains and the set of malicious domains as malicious to produce a labelled set of malicious domains;

sampling, by the one or more processors, a preset percentage of the labelled set of normal domains to produce a sampled set of normal domains;

aggregating, by the one or more processors, the sampled set of normal domains and the labelled set of malicious domains based on a number of hits for each domain to produce a first set of aggregated domains;

filtering, by the one or more processors, the first set of aggregated domains using a hit size filter, an inter-arrival-time filter, and a univariate volumetric filter to produce a first set of filtered domains; and

determining, by the one or more processors, a cluster of a set of clusters to which each of the first set of filtered domains is to be assigned using a trained K-shape model.