US 12,067,023 B2
	Machine learning techniques for efficient data pattern recognition across databases
Swapna Sourav Rout, Bangalore (IN); Ankit Varshney, Delhi (IN); Sudeep Choudhary, Jharia (IN); Ravi Kumar Raju Gottumukkala, Bengaluru (IN); and Sreedhar Terala, Maple Grove, MN (US)
Assigned to Optum, Inc., Minnetonka, MN (US)
Filed by Optum, Inc., Minnetonka, MN (US)
Filed on Aug. 26, 2021, as Appl. No. 17/412,651.
Prior Publication US 2023/0062114 A1, Mar. 2, 2023
Int. Cl. G06F 17/00 (2019.01); G06F 16/25 (2019.01); G06F 16/28 (2019.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06N 20/00 (2019.01)

CPC G06F 16/25 (2019.01) [G06F 16/285 (2019.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06N 20/00 (2019.01)]

14 Claims

1. A computer-implemented method comprising:

identifying, by one or more processors, a first graph-based database representation of a first database, wherein the first graph-based database representation describes one or more first referential key relationships across one or more first table columns of the first database, wherein the first database is associated with a set of first database table data objects of a set of total table data objects;

identifying, by the one or more processors, a second graph-based database representation of a second database, wherein the second graph-based database representation describes one or more second referential key relationships across one or more second table columns of the second database, wherein the second database is associated with a set of second database table data objects of the set of total table data objects;

determining, by the one or more processors, a predicted cross-database structural similarity score for the first database and the second database based at least in part on a structural similarity between the first graph-based database representation of the first database and the second graph-based database representation of the second database;

determining, by the one or more processors, a predicted cross-database similarity score for the first database and the second database based at least in part on the predicted cross-database structural similarity score, wherein the predicted cross-database similarity score is adjusted based at least in part on a predicted cross-database data similarity score for the first database and the second database and the predicted cross-database data similarity score is determined based at least in part on a data matching table count of a data matching table subset of the set of total table data objects; and

initiating, by the one or more processors, a performance of a prediction-based action based at least in part on the predicted cross-database similarity score.