CPC G06F 16/219 (2019.01) [G06F 16/215 (2019.01); G06F 18/22 (2023.01)] | 16 Claims |
1. A computer-implemented method for determining duplicates in a graph in a hybrid master data management system based on different versions of the graph, comprising:
providing a first version of a graph, with the first version of the graph being a previous version of the graph stored on a virtual master data management (MDM) system, the virtual MDM system being configured to store and create data in a distributed arrangement across one or more source systems;
identifying at least two target nodes of the graph, wherein each node of the at least two target nodes has a set of entity attributes and for each entity attribute of the set of entity attributes:
comparing each version of an entity attribute of one target node with each version of the entity attribute of a second target node, with each comparison resulting in an individual data similarity score;
weighting with a penalty weight the individual data similarity scores that resulted from a comparison involving a first version of the entity attribute that is different from a second version of the entity attribute;
selecting a highest data similarity score of the individual data similarity scores of the entity attribute; and
combining the selected highest data similarity scores of the set of entity attributes for obtaining a comparison score;
comparing the first version and a second version of the graph for determining the comparison score indicative of a similarity between the two target nodes, the second version being a current version of the graph stored on a physical MDM system, the physical MDM system being configured to store and create data in a centralized system; and
using the comparison score for determining whether the two target nodes are duplicates with respect to each other.
|