US 11,934,390 B2
Approaches for knowledge graph pruning based on sampling and information gain theory
Teresa Sheausan Tung, San Jose, CA (US); Colin Anil Puri, San Jose, CA (US); and Zhijie Wang, Fremont, CA (US)
Assigned to Accenture Global Solutions Limited, Dublin (IE)
Filed by Accenture Global Solutions Limited, Dublin (IE)
Filed on Aug. 5, 2019, as Appl. No. 16/531,711.
Application 16/531,711 is a continuation of application No. 16/520,611, filed on Jul. 24, 2019, granted, now 11,693,848.
Claims priority of provisional application 62/715,598, filed on Aug. 7, 2018.
Prior Publication US 2020/0050605 A1, Feb. 13, 2020
Int. Cl. G06F 16/242 (2019.01); G06F 16/23 (2019.01); G06F 16/901 (2019.01)
CPC G06F 16/2425 (2019.01) [G06F 16/2379 (2019.01); G06F 16/9024 (2019.01)] 9 Claims
OG exemplary drawing
 
1. A knowledge graph system comprising:
memory for storing instructions; and
a processor in communication with the memory, wherein the processor, when executing the instructions, is configured to:
ingest and store a historical knowledge graph in a memory space by generating a set of entity nodes and edges of the historical knowledge graph from a plurality of data sources according to a historical knowledge graph schema, the historical knowledge graph schema being stored in the memory space and defining entity node types and edge types for creating the historical knowledge graph;
receive a query request;
execute by the processor a depth-first search of the historical knowledge graph in the memory space to identify a plurality of traversal paths in the historical knowledge graph according to historical query requests to form a sampled knowledge graph, the sampled knowledge graph being a subset of the historical knowledge graph;
store numbers of times for traversal of edges of the plurality of traverse paths of the sampled knowledge graph, the numbers of times being extracted based on historical query requests, being generated as counter values, and being stored as metadata of the edges of the plurality of traverse paths of the sampled knowledge graph;
retrieve the metadata of the edges of the plurality of traverse paths of the sampled knowledge graph including the counter values as quantified information gains of the edges of the plurality of traverse paths of the sampled knowledge graph;
remove, from the sampled knowledge graph, one or more entity nodes or edges associated with at least one removal traversal path from the plurality of traversal paths having the quantified information gain lower than a predetermined information gain threshold;
create a pruned knowledge graph in the memory space after removing the one or more entity nodes or edges associated with the at least one removal traversal path;
remove a subset of entity node types and edge types corresponding to the entity nodes and edges of the at least one removal traversal paths from the historical knowledge graph schema;
create a pruned knowledge graph schema from the historical knowledge graph schema after removing the subset of entity node types and edge types to replace the historical knowledge graph schema so as to reduce space occupied by the historical knowledge graph schema in the memory space; and
execute the query by traversing the pruned knowledge graph in the reduced memory space in order to decrease traversal time for returning a query result.