US 12,411,948 B2
Clustering and cluster tracking of categorical data
Michael A. Betser, Kirkland, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jan. 17, 2024, as Appl. No. 18/414,994.
Application 18/414,994 is a continuation of application No. 16/917,589, filed on Jun. 30, 2020, granted, now 11,914,705.
Prior Publication US 2024/0193269 A1, Jun. 13, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 21/00 (2013.01); G06F 16/28 (2019.01); G06F 21/55 (2013.01); G06F 21/56 (2013.01)
CPC G06F 21/554 (2013.01) [G06F 16/285 (2019.01); G06F 21/56 (2013.01); G06F 2221/034 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for processing data items, the method comprising:
representing the data items as initial data points represented by vectors comprising respective ordered sets of vector elements;
initializing a cluster set of data points with the initial data points, the data points in the cluster set representing cluster centers and the initial data points serving as initial cluster centers of singleton clusters;
iteratively updating the cluster set of data points in a loop comprising two or more iterations, utilizing respective permutations of the vector elements within the vectors, the permutations differing between different iterations, the iterations comprising:
creating an ordering of the data points in the cluster set based on the respective permutation of the vector elements within the vectors,
partitioning the data points in the cluster set into at least two blocks based on the ordering,
separately clustering the data points within the at least two blocks to create new clusters of individual data points,
determining, from the individual data points within the new clusters, respective new cluster centers, and,
creating an updated cluster set of data points by replacing the individual data points within the clusters by the respective new cluster centers, the new cluster centers being represented by respective vectors;
upon exiting the loop, outputting one or more final clusters of the data items based on the updated cluster set of data points; and
for at least one of the one or more final clusters, performing a common action on the data items within the final cluster.