CPC G06F 16/2456 (2019.01) [G06F 16/221 (2019.01); G06F 16/248 (2019.01); G06F 16/254 (2019.01)] | 20 Claims |
1. A method comprising, at a computer system:
accessing a first plurality of columns in a first dataset stored in a first data source;
accessing a second plurality of columns in a second dataset stored in a second data source;
identifying a plurality of column pairs between the first dataset and the second dataset, wherein each column pair in the plurality of column pairs includes a different one of the first plurality of columns and a different one of the second plurality of columns, and wherein all possible pairs of column pairs are identified between the first dataset and the second dataset;
determining one or more column pairs from the plurality of identified column pairs to exclude;
excluding at least one column pair from the one or more determined column pairs; and
for each of the one or more column pairs remaining after the excluding step:
based on a type of join specified via a graphical interface, computing a plurality of scores for a column pair, each of the plurality of scores computed based on a different one of a plurality of scoring functions, the score indicating a measure for joining columns in the column pair.
|