CPC G06F 16/906 (2019.01) | 20 Claims |
1. A computer-implemented method for clustering a plurality of content items of a data set, comprising:
receiving a request to display the plurality of content items;
identifying the data set comprising the plurality of content items for clustering across a plurality of iterations for each of a plurality of levels, including both a first level of clustering with a first similarity threshold, and a second level of clustering with a second similarity threshold different from the first similarity threshold;
performing for each of the plurality of levels:
identifying a plurality of pairs of content items amongst the plurality of content items;
computing a similarity score for each of the plurality of pairs of content items, the similarity score indicating a similarity between the content items comprising a respective pair;
identifying a subset of pairs from the plurality of pairs, wherein the similarity score, for each pair from the subset of pairs, exceeds a similarity threshold for a respective level;
clustering the subset of pairs, identified from the plurality of pairs of content items, into a clustered subset based on the similarity score, for each pair from the subset of pairs, exceeding the similarity threshold for the respective level; and
repeating the identifying the plurality of pairs, the computing the similarity score, the identifying the subset, and the clustering the subset for each of the plurality of iterations for the respective level, for each subsequent iteration at the respective level;
identifying a final clustered subset comprising the clustered subset after each of the plurality of iterations for each of the plurality of levels after the performing has been completed; and
outputting the final clustered subset for display, responsive to the request to display the plurality of content items.
|