CPC G06F 16/9024 (2019.01) [G06F 16/284 (2019.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A method for generating a unified knowledge graph, comprising:
receiving entity data from a data source comprising a plurality of nodes;
forming a plurality of type-specific groups of nodes based on the received entity data;
for each respective type-specific group of nodes of the plurality of type-specific groups of nodes:
disambiguating the nodes within the respective type-specific group of nodes to identify one or more sets of related nodes representing a single entity within the respective type-specific group of nodes, wherein disambiguating the nodes comprises:
determining a blocked data set from the entity data in the respective type-specific group of nodes based on one or more blocking parameters common to each member of the blocked data set and not associated with members of the respective type-specific group of nodes that are not included in the blocked data set; and
refining the blocked data set based on a machine learning model trained to identify similar entities in the blocked data set;
creating a master node representing the single entity for every set of related nodes of the one or more sets of related nodes;
creating entity relationships between the master node for each respective set of related nodes and each of the nodes in the respective set of related nodes; and
exporting the master node for every set of related nodes of the one or more sets of related nodes, the entity relationships, and the nodes within the respective type-specific group of nodes to a type-specific subgraph; and
forming a unified knowledge graph based on a plurality of type-specific subgraphs, wherein,
the unified knowledge graph is a queryable graph database and comprises only master nodes associated with each set of related nodes in each type-specific subgraph, and
a number of nodes in the unified knowledge graph is less than a sum of the number of nodes in each type-specific subgraph.
|