US 11,782,955 B1
Multi-stage clustering
Deepika Jindal, Kent, WA (US); and Igor Grudetskyi, Seattle, WA (US)
Assigned to AMAZON TECHNOLOGIES, INC., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Aug. 26, 2021, as Appl. No. 17/458,184.
Int. Cl. G06F 16/28 (2019.01); G06N 20/00 (2019.01)
CPC G06F 16/285 (2019.01) [G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method of multi-stage clustering, comprising:
receiving a data set comprising a first plurality of elements, where each element of the first plurality of elements includes respective text describing the element;
dividing the first plurality of elements into a first plurality of clusters at a first hierarchical level using a clustering algorithm;
dividing each of the first plurality of clusters into a second plurality of clusters at a second hierarchical level using the clustering algorithm;
dividing each of the second plurality of clusters into a third plurality of clusters at a third hierarchical level using the clustering algorithm;
dividing each of the third plurality of clusters into a fourth plurality of clusters at a fourth hierarchical level using the clustering algorithm;
determining resource data representing a number of resources for labeling the first plurality of elements based at least in part on the first plurality of clusters at the first hierarchical level;
assigning the number of resources to label clusters of the second plurality of clusters at the second hierarchical level based on a number of the fourth plurality of clusters in each cluster of the second plurality of clusters;
assigning, for a first cluster of the fourth plurality of clusters, a label that classifies elements of the first cluster as belonging to a first class; and
determining that the elements of the first cluster are non-compliant with respect to a first policy based at least in part on the label.