| CPC G06F 16/355 (2019.01) [G06V 10/763 (2022.01); G06V 10/82 (2022.01); G06V 30/19127 (2022.01); G06V 30/40 (2022.01); G06V 30/413 (2022.01); G06V 30/416 (2022.01); G06V 30/10 (2022.01)] | 15 Claims |

|
1. A computer-implemented method, comprising:
accessing, by a processor, a set of asset documents;
receiving a feature extraction tool selection from a set of feature extraction tools stored within a feature extraction module, wherein the set of feature extraction tools stored within the feature extraction module comprises a bidirectional encoder representation of transformers (BERT) based feature extraction tool or a bag-of-words feature extraction tool;
performing, by the processor, feature extraction on text of each document of the set of asset documents using the feature extraction tool to generate a set of features, wherein each feature of the set of features represents a document of the set of asset documents;
providing, by the processor, the set of features to a singular value decomposition module;
generating, by the processor, a set of lower-dimensional features from the set of features using the singular value decomposition module;
generating, by the processor, a set of clusters representing document classes from the set of lower-dimensional features using a clustering module, wherein the clustering module is configured to perform a density based spatial clustering operation by assigning at least one document of the set of asset documents as a border sample document within the cluster of the set of clusters;
performing, by the processor, a grid search to adjust adjustable parameters of the feature extraction module, the singular value decomposition module, the clustering module, or a combination thereof; and
training, by the processor, a machine-learning model of a classification microservice using the set of clusters generated from the clustering module, wherein the classification microservice is configured to assign a document class of the set of document classes to additional documents added to the set of asset documents and further train the machine-learning model using results of clustering the additional documents.
|