US 12,326,871 B2
Detecting duplicate tables in data lake databases
Karina Elayne Kervin, Sacramento, CA (US); Jian Wu, Round Rock, TX (US); Sibasis Das, Kolkata (IN); Radha Mohan De, Howrah (IN); Swaminathan Balasubramanian, Troy, MI (US); and Cheranellore Vasudevan, Bastrop, TX (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Nov. 13, 2023, as Appl. No. 18/507,769.
Prior Publication US 2025/0156435 A1, May 15, 2025
Int. Cl. G06F 16/20 (2019.01); G06F 16/215 (2019.01); G06F 16/22 (2019.01); G06F 16/25 (2019.01); G06N 5/022 (2023.01)
CPC G06F 16/254 (2019.01) [G06F 16/215 (2019.01); G06F 16/2246 (2019.01); G06N 5/022 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer implemented method of detecting duplicate tables comprising:
converting syntactical data from relational tables from a schema of a database to knowledge graphs having semantic data;
mapping nodes in the knowledge graphs to sources in the relational tables;
applying graph matching to the knowledge graphs;
detecting semantically equivalent tables from the relational tables by assessing a degree of matching between matched knowledge graphs based on equivalent data from common linked-pair nodes, defined from relationships, from the knowledge graphs; and
generating a corrective flag to eliminate the semantically equivalent tables from the relational tables from the schema of the database based on the degree of matching.