CPC G06F 16/24544 (2019.01) [G06F 16/221 (2019.01); G06F 16/24537 (2019.01)] | 19 Claims |
1. A method for identifying table joins, comprising:
obtaining respective casting similarities between pairs of columns of a first table and a second table, wherein a pair of columns of the pairs of columns comprises a first column of the first table and a second column of the second table, and wherein a casting similarity for the pair of columns is obtained by steps comprising:
assigning the casting similarity to the pair of columns based on an identified extent to which first data values of the first column are changeable to a data type of the second column, wherein the casting similarity is selected from a set comprising a ‘very low’ casting similarity, and wherein the ‘very low’ casting similarity is assigned to a given pair of columns in a case that one column of the given pair of columns has a BOOLEAN type and the other column of the given pair of columns has a FLOAT type;
discarding, to obtain first join candidates, ones of the pairs of columns having the respective casting similarities not satisfying a casting similarity threshold;
obtaining respective string similarities for the first join candidates;
discarding ones of the first join candidates not satisfying a string similarity condition to obtain second join candidates;
obtaining final join candidates using the respective casting similarities and the respective string similarities of the second join candidates, wherein each of the final join candidates includes a column of the first table and a column of the second table;
presenting the final join candidate on a device of a user;
receiving, from the device of the user, a selected join candidate of the final join candidates;
querying a database based on a data query that includes a join of the first table and the second table to obtain tabular data, wherein the join is based on the selected join candidate; and
outputting the tabular data.
|