US 12,248,488 B2
Data recommender using lineage to propagate value indicators
Ted Dunning, Santa Clara, CA (US); Suparna Bhattacharya, Bangalore (IN); Glyn Bowden, Bristol (GB); Lin A. Nease, San Jose, CA (US); Janice M. Zdankus, San Jose, CA (US); and Sonu Sudhakaran, Bangalore (IN)
Assigned to Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed by HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, Spring, TX (US)
Filed on Jul. 12, 2023, as Appl. No. 18/351,355.
Application 18/351,355 is a continuation of application No. 17/843,757, filed on Jun. 17, 2022, granted, now 11,907,241.
Prior Publication US 2023/0409587 A1, Dec. 21, 2023
Int. Cl. G06F 16/248 (2019.01); G06F 16/2455 (2019.01); G06F 16/25 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/248 (2019.01) [G06F 16/24556 (2019.01); G06F 16/254 (2019.01); G06F 16/288 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a graphical user interface (GUI);
a data pipeline database;
one or more processing resources; and
a non-transitory computer-readable medium, coupled to the one or more processing resources, having stored therein instructions that when executed by the one or more processing resources cause the system to:
extract and characterize metadata from data processing pipelines, wherein extracting and characterizing metadata from the data processing pipelines comprises generating lineage representations for the data processing pipelines, wherein the lineage representations comprise one or more directed graphs comprising associations between a plurality of data-related products and a plurality of ancestor data-related products and processing steps from which the plurality of data-related products are derived from the plurality of ancestor data-related products;
catalog the characterized metadata;
update the lineage representations by propagating value indicators through the lineage representations for the data processing pipelines;
based on the propagation of the value indicators, identify a subset of ancestor data-related products, from the plurality of ancestor data-related products of the data processing pipelines, having highest relative value indictor;
encapsulate the updated lineage representations and characterized metadata as one or more exportable data-related packages stored to non-transitory computer-readable memory of the data pipeline database;
reference the exportable data-related packages based on a target data analysis project received via the GUI;
based on the desired characterized metadata, identify a high value-influencing metadata characteristic from the extracted metadata that arises with statistically significant frequency among the subset of ancestor data-related products having the highest relative value indicator to at least one of the generated desired metadata characteristics;
recommend, via the GUI, one or more ancestor data-related products of the subset of ancestor data-related products having the high-value influencing metadata characteristic for the target data analysis project; and
generate the target data analysis project using the recommended one or more of the ancestor data-related products.