US 12,007,965 B2
Method, device and storage medium for deduplicating entity nodes in graph database
Yifei Wang, Beijing (CN); Yang Wang, Beijing (CN); and Yu Wang, Beijing (CN)
Assigned to BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed by BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed on May 12, 2022, as Appl. No. 17/663,044.
Claims priority of application No. 202111144175.6 (CN), filed on Sep. 28, 2021.
Prior Publication US 2022/0269659 A1, Aug. 25, 2022
Int. Cl. G06F 16/215 (2019.01); G06F 16/901 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/9014 (2019.01); G06F 16/9024 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method for deduplicating entity nodes in a graph database, wherein the graph database comprises at least one knowledge graph, the graph database is a distributed graph database based on a design of separate architecture for storage and computation, the graph database comprises a plurality of computing nodes and storage nodes, each storage node saves a different part of the knowledge graph and the knowledge graph is obtained by splicing the different part of the knowledge graph stored in each storage node, and the method is performed by the computing node in the plurality of computing nodes and comprises:
obtaining a set of entity nodes to be deduplicated from the knowledge graph, wherein the set comprises a plurality of entity nodes, wherein the computing node performs multi-step graph walk on the graph database, and an entity node result is obtained for a current step from the knowledge graph is determined as the set of entity nodes to be deduplicated;
selecting an untraversed entity node from the set as a target entity node;
determining a range located by a node identifier corresponding to the target entity node;
determining the target entity node that has appeared in traversed entity nodes according to a deduplicating mode corresponding to the range, wherein the traversed entity nodes comprise entity nodes after deduplication in previous steps;
deleting the target entity node from the set;
obtaining a plurality of first bit segments corresponding to the node identifier of the target entity node by performing segmenting on bits of the node identifier of the target entity node, wherein bit lengths of the first bit segments are the same, and some of the first bit segments contain the same bits;
for each first bit segment, obtaining a first bitset corresponding to the first bit segment, wherein a bit length of the first bitset is greater than or equal to the bit length of the first bit segment;
obtaining bits corresponding to a value of the first bit segment from the first bitset, and
determining values of bits corresponding to the first bit segments as first values, wherein the first value is configured to indicate that the value corresponding to the first bit segment has appeared.