| CPC G06F 21/60 (2013.01) [G06F 18/23213 (2023.01); G06F 40/12 (2020.01)] | 20 Claims |

|
1. A method for clustering data objects, said method comprising:
accessing, by one or more processors of a computer system, a set of data objects arranged in an initial sequential order, wherein the set of data objects consists of S data objects, wherein S is at least 2, wherein each data object includes a code and a score, wherein each code represents an instance of the data object and each code is a positive integer subject to codes collectively consisting of positive integers 1, 2, . . . , N subject to N≤S, wherein each score is a positive real number denoting a measure of a parameter pertaining to the instance that is represented by the code, wherein scores collectively consist of B unique scores subject to B≤S;
sorting, by the one or more processors, the data objects using the score as a sort key to rearrange the data objects in an ascending order of the score, wherein each unique score has a sequence number in the sorted data objects, resulting in B consecutive sequence numbers;
transforming, by the one or more processors, the S data objects into respective S binary words, wherein each binary word corresponding to a data object of the S data objects consists of B bits characterized by: (i) a 1 bit in a bit position of the binary word corresponding to respective sequence number of sorted unique score and (ii) a 0 bit in all other bit positions of the binary word;
encoding, by the one or more processors, the S data objects into a sequence of N blocks, wherein each block consists of B bits in a binary format, and wherein the N blocks are sequenced and have bit configurations that depend on the initial sequential order of the data objects and the sequence numbers of sorted unique scores;
generating, by the one or more processors from the N blocks, M block clusters respectively comprising M respective cluster centers, wherein each cluster center is a different block of the N blocks, wherein R remaining blocks of the N blocks are distributed into the M block clusters in a manner that minimizes a weighted bit separation distance between each of the R remaining blocks and each of the M cluster centers, wherein M+R=N, and wherein 2<M<N;
converting, by the one or more processors, the M block clusters into respective M word clusters of binary words, wherein the S binary words are distributed into the respective M word clusters; and
for each word cluster of the respective M word clusters having J binary words in each word cluster, reconfiguring, by the one or more processors, an M word cluster, or the respective M word clusters, into L word clusters into which the J binary words are distributed, by minimizing a total number of deviations in the L word clusters, wherein L is at least 1.
|