US 11,914,705 B2
	Clustering and cluster tracking of categorical data
Michael A. Betser, Kirkland, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jun. 30, 2020, as Appl. No. 16/917,589.
Prior Publication US 2021/0406366 A1, Dec. 30, 2021
Int. Cl. G06F 21/00 (2013.01); G06F 21/55 (2013.01); G06F 16/28 (2019.01); G06F 21/56 (2013.01)

CPC G06F 21/554 (2013.01) [G06F 16/285 (2019.01); G06F 21/56 (2013.01); G06F 2221/034 (2013.01)]

17 Claims

1. A computer-implemented threat-protection method for processing a plurality of text-based electronic messages, the method comprising:

mapping the plurality of text-based electronic messages onto a plurality of initial data points each represented by a multi-dimensional categorical vector comprising a fixed number of vector elements;

initializing a cluster set of data points representing cluster centers with the plurality of initial data points, the initial data points serving as initial cluster centers of singleton clusters;

iteratively clustering the data points in the cluster set in two or more iterations, each iteration comprising:

creating an ordering of the data points in the cluster set based on a permutation of the vector elements within the multi-dimensional categorical vectors,

partitioning the data points in the cluster set into at least two blocks based on the ordering,

clustering the data points within each of the at least two blocks, and,

for each cluster of at least one cluster created as a result of the clustering, replacing, in the cluster set of data points, all individual data points within the cluster by a new cluster center determined from the individual data points within the cluster, the new cluster center being represented by a multi-dimensional categorical vector,

wherein the permutations differ between different iterations;

determining a threat associated with a cluster of text-based electronic messages formed during the iterative clustering; and

performing a mitigating action based on the determined threat.