| CPC G06F 16/901 (2019.01) [G06F 7/14 (2013.01)] | 20 Claims |

|
1. A computer implemented method comprising:
processing a plurality of raw datasets for generating index metadata and associating one or more index defining the index metadata to respective ones of the plurality of raw datasets, wherein the respective ones of the plurality of raw datasets define respective metadata augmented datasets by the associating of the one or more index thereto, wherein the associating of the one or more index defining the index metadata to the respective ones of the plurality of raw datasets includes associating to the respective ones of the plurality of raw datasets a column association index that specifies a strength of column associations between columns of the respective ones of the plurality of raw datasets with respective strength values, wherein a strength value of a column association between a first column and a second column in a respective raw dataset is based on a frequency in which the first column and the second column appear together in a same dataset in a corpus of stored datasets having at least one of the first column or the second column;
examining augmented datasets of the metadata augmented datasets in dependence on metadata of the index metadata and in dependence on user defined input data specifying parameter values of a production dataset to be produced; and
merging first and second ones of the augmented datasets in selected groups of augmented datasets in dependence on the examining of the augmented datasets, wherein the merging of the first and second ones of the augmented datasets in dependence on the examining of the augmented datasets is performed in support of preparing the production dataset, and wherein the merging of the first and second ones of the augmented datasets is performed in dependence on an evaluation of the column association index of the first augmented dataset in reference to the column association index of the second augmented dataset.
|