CPC G06F 16/285 (2019.01) [G06N 20/00 (2019.01)] | 20 Claims |
1. A computer-implemented method of multi-stage clustering, comprising:
receiving a data set comprising a first plurality of elements, where each element of the first plurality of elements includes respective text describing the element;
dividing the first plurality of elements into a first plurality of clusters at a first hierarchical level using a clustering algorithm;
dividing each of the first plurality of clusters into a second plurality of clusters at a second hierarchical level using the clustering algorithm;
dividing each of the second plurality of clusters into a third plurality of clusters at a third hierarchical level using the clustering algorithm;
dividing each of the third plurality of clusters into a fourth plurality of clusters at a fourth hierarchical level using the clustering algorithm;
determining resource data representing a number of resources for labeling the first plurality of elements based at least in part on the first plurality of clusters at the first hierarchical level;
assigning the number of resources to label clusters of the second plurality of clusters at the second hierarchical level based on a number of the fourth plurality of clusters in each cluster of the second plurality of clusters;
assigning, for a first cluster of the fourth plurality of clusters, a label that classifies elements of the first cluster as belonging to a first class; and
determining that the elements of the first cluster are non-compliant with respect to a first policy based at least in part on the label.
|