US 12,265,577 B2
Knowledge graph management based on multi-source data
Zijia Wang, WeiFang (CN); Victor Fong, Medford, MA (US); Zhen Jia, Shanghai (CN); and Jiacheng Ni, Shanghai (CN)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Apr. 14, 2021, as Appl. No. 17/230,433.
Prior Publication US 2022/0335307 A1, Oct. 20, 2022
Int. Cl. G06F 16/903 (2019.01); G06F 40/30 (2020.01)
CPC G06F 16/90335 (2019.01) [G06F 40/30 (2020.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus comprising:
at least one processing platform comprising at least one processor coupled to at least one memory, the at least one processing platform, when executing program code, is configured to:
collect data from a plurality of data sources;
extract structured data and unstructured data from the collected data using an unsupervised machine learning model, wherein the extracting comprises:
selecting and processing a portion of the unstructured data using one or more unstructured data filters based on one or more natural language processing metrics;
selecting a portion of the structured data using one or more structured data filters; and
applying the unsupervised machine learning model to the selected and processed portion of the unstructured data to determine one or more topics of the unstructured data;
form a plurality of sub-graph structures comprising a sub-graph structure for each of the data sources based on at least the selected portion of the structured data and the selected and processed portion of the unstructured data;
compute, for a given node of a given sub-graph structure, a title embedding based on a title portion of the selected portion of structured data and a topic embedding based on the one or more topics of the unstructured data and one or more other portions of the selected portion of the structured data;
combine the plurality of sub-graph structures to form a combined graph structure representing the collected data from the plurality of data sources, wherein combining the plurality of sub-graph structures to form a combined graph structure further comprises:
using a graph and report generator to measure a distance between two nodes and adding an edge between the two nodes when the distance is at or below a given distance threshold value; and
apply the combined graph structure to one or more user tasks.