CPC G06F 16/285 (2019.01) [G06F 16/215 (2019.01)] | 20 Claims |
1. A method, comprising:
receiving, by a device, data objects from an object corpus stored in a data structure;
identifying, by the device, unique segments within the data objects as elements;
replacing, by the device, all equivalent segments with one representative segment;
generating, by the device, an embedding space based on unique elements and mappings of the data objects to embeddings;
estimating, by the device, semantic proximities among the data objects based on the mappings of the data objects to the embeddings;
building, by the device, a semantic cohesion network among the data objects based on the semantic proximities among the data objects;
identifying, by the device, semantically cohesive data clusters in the semantic cohesion network;
sorting, by the device, the data objects in the semantically cohesive data clusters to generate semantically cohesive and sorted data clusters;
receiving, by the device, a new data object;
determining, by the device and from the semantically cohesive and sorted data clusters, a home data cluster for the new data object;
determining, by the device, a semantic relationship between the new data object and a data object in the home data cluster; and
storing, by the device, bookkeeping details of the new data object in the data structure based on the semantic relationship between the new data object and the data object in the home data cluster.
|