CPC G06F 16/24578 (2019.01) [G06F 16/9024 (2019.01)] | 20 Claims |
1. A method comprising:
identifying metadata for a plurality of datasets;
generating a graph structure in storage, the graph structure comprising a plurality of nodes connected by a plurality of edges, each node of the plurality of nodes representing a respective dataset of the plurality of datasets, the plurality of edges connecting the plurality of nodes according to a data lineage determined from metadata of the plurality of datasets;
generating, by a computer processor, a composite score for each node of the plurality of nodes of the graph structure to generate a plurality of composite scores;
processing, from a storage by the computer processor, the plurality of composite scores for the plurality of nodes of the graph structure to generate a respective dataset rank for each dataset in the plurality of datasets,
wherein processing the plurality of composite scores comprises:
setting the respective dataset rank based on the composite score for the each node of the plurality of nodes;
recursively updating the respective dataset rank of each dataset based on (i) dataset ranks for linked datasets, the linked datasets corresponding to a subset of the plurality of nodes that are directly connected to a node of the plurality of nodes, and further based on (ii) a number of datasets to which the each dataset is linked, wherein:
the dataset ranks comprise the respective dataset rank for each of the linked datasets,
after updating an initial dataset rank, during each subsequent updating of recursively updating the respective dataset rank, using a previously determined dataset rank as input, and
updating continues until a sum of changes of the dataset ranks is below a change threshold;
sorting the plurality of datasets according to the respective dataset rank of the each dataset; and
presenting, in an interface, the plurality of datasets sorted according to the respective dataset rank of the each dataset.
|