| CPC G06F 16/90335 (2019.01) [G06F 40/30 (2020.01)] | 20 Claims |

|
1. An apparatus comprising:
at least one processing platform comprising at least one processor coupled to at least one memory, the at least one processing platform, when executing program code, is configured to:
collect data from a plurality of data sources;
extract structured data and unstructured data from the collected data using an unsupervised machine learning model, wherein the extracting comprises:
selecting and processing a portion of the unstructured data using one or more unstructured data filters based on one or more natural language processing metrics;
selecting a portion of the structured data using one or more structured data filters; and
applying the unsupervised machine learning model to the selected and processed portion of the unstructured data to determine one or more topics of the unstructured data;
form a plurality of sub-graph structures comprising a sub-graph structure for each of the data sources based on at least the selected portion of the structured data and the selected and processed portion of the unstructured data;
compute, for a given node of a given sub-graph structure, a title embedding based on a title portion of the selected portion of structured data and a topic embedding based on the one or more topics of the unstructured data and one or more other portions of the selected portion of the structured data;
combine the plurality of sub-graph structures to form a combined graph structure representing the collected data from the plurality of data sources, wherein combining the plurality of sub-graph structures to form a combined graph structure further comprises:
using a graph and report generator to measure a distance between two nodes and adding an edge between the two nodes when the distance is at or below a given distance threshold value; and
apply the combined graph structure to one or more user tasks.
|