US 12,248,503 B2
	Systems and methods for cloud content-based document clustering and classification integration
Zhihong Zeng, Acton, MA (US); Anwar Chaudhry, Mississauga (CA); Rajesh Chandrasekhar, Franklin, TN (US); Adam Darius Williams, Harleysville, PA (US); and Utpal N. Gandhi, Wilmington, MA (US)
Assigned to Iron Mountain Incorporated, Boston, MA (US)
Filed by Iron Mountain Incorporated, Boston, MA (US)
Filed on May 29, 2020, as Appl. No. 16/887,751.
Claims priority of provisional application 62/854,230, filed on May 29, 2019.
Prior Publication US 2020/0380254 A1, Dec. 3, 2020
Int. Cl. G06F 16/35 (2019.01); G06F 16/355 (2025.01); G06V 10/762 (2022.01); G06V 10/82 (2022.01); G06V 30/10 (2022.01); G06V 30/19 (2022.01); G06V 30/40 (2022.01); G06V 30/413 (2022.01); G06V 30/416 (2022.01)

CPC G06F 16/355 (2019.01) [G06V 10/763 (2022.01); G06V 10/82 (2022.01); G06V 30/19127 (2022.01); G06V 30/40 (2022.01); G06V 30/413 (2022.01); G06V 30/416 (2022.01); G06V 30/10 (2022.01)]

15 Claims

1. A computer-implemented method, comprising:

accessing, by a processor, a set of asset documents;

receiving a feature extraction tool selection from a set of feature extraction tools stored within a feature extraction module, wherein the set of feature extraction tools stored within the feature extraction module comprises a bidirectional encoder representation of transformers (BERT) based feature extraction tool or a bag-of-words feature extraction tool;

performing, by the processor, feature extraction on text of each document of the set of asset documents using the feature extraction tool to generate a set of features, wherein each feature of the set of features represents a document of the set of asset documents;

providing, by the processor, the set of features to a singular value decomposition module;

generating, by the processor, a set of lower-dimensional features from the set of features using the singular value decomposition module;

generating, by the processor, a set of clusters representing document classes from the set of lower-dimensional features using a clustering module, wherein the clustering module is configured to perform a density based spatial clustering operation by assigning at least one document of the set of asset documents as a border sample document within the cluster of the set of clusters;

performing, by the processor, a grid search to adjust adjustable parameters of the feature extraction module, the singular value decomposition module, the clustering module, or a combination thereof; and

training, by the processor, a machine-learning model of a classification microservice using the set of clusters generated from the clustering module, wherein the classification microservice is configured to assign a document class of the set of document classes to additional documents added to the set of asset documents and further train the machine-learning model using results of clustering the additional documents.