CPC G06F 16/25 (2019.01) [G06F 16/285 (2019.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06N 20/00 (2019.01)] | 14 Claims |
1. A computer-implemented method comprising:
identifying, by one or more processors, a first graph-based database representation of a first database, wherein the first graph-based database representation describes one or more first referential key relationships across one or more first table columns of the first database, wherein the first database is associated with a set of first database table data objects of a set of total table data objects;
identifying, by the one or more processors, a second graph-based database representation of a second database, wherein the second graph-based database representation describes one or more second referential key relationships across one or more second table columns of the second database, wherein the second database is associated with a set of second database table data objects of the set of total table data objects;
determining, by the one or more processors, a predicted cross-database structural similarity score for the first database and the second database based at least in part on a structural similarity between the first graph-based database representation of the first database and the second graph-based database representation of the second database;
determining, by the one or more processors, a predicted cross-database similarity score for the first database and the second database based at least in part on the predicted cross-database structural similarity score, wherein the predicted cross-database similarity score is adjusted based at least in part on a predicted cross-database data similarity score for the first database and the second database and the predicted cross-database data similarity score is determined based at least in part on a data matching table count of a data matching table subset of the set of total table data objects; and
initiating, by the one or more processors, a performance of a prediction-based action based at least in part on the predicted cross-database similarity score.
|