| CPC G06F 8/433 (2013.01) [G06F 3/0428 (2013.01); G06F 3/0482 (2013.01); G06F 3/0649 (2013.01); G06F 8/10 (2013.01); G06F 8/34 (2013.01); G06F 8/4452 (2013.01); G06F 16/144 (2019.01); G06F 16/211 (2019.01); G06F 16/2322 (2019.01); G06F 16/2358 (2019.01); G06F 16/254 (2019.01); G06F 16/435 (2019.01); G06F 17/18 (2013.01); G06F 40/30 (2020.01); G06N 5/022 (2013.01); G06N 5/04 (2013.01); G06N 5/046 (2013.01); G06N 20/00 (2019.01); G06Q 10/0637 (2013.01); G06F 9/5061 (2013.01)] | 20 Claims |

|
1. A method for use with a data integration or other computing environment, comprising:
providing, at a computational environment, a design-time system comprising a software development component that enables design of dataflow software applications and including a user interface, for design and management of software application pipelines, wherein an application pipeline defines and is associated with a data flow having a plurality of semantic actions that operate on an input data for preparation as an output data;
providing a knowledge source that stores metadata associated with processing data flows associated with input hubs and output hubs;
wherein each data flow includes a specification of one or more data sources and data targets that operate as hubs and comprise datasets having attributes associated therewith;
wherein a data flow is associated with actions that operate on one or more input datasets to transform and output data to one or more output datasets; and
wherein a dataflow software application operates so that data is received from a data source operating as an input hub and comprising a source dataset, and provided to a target dataset at a same or other hub, according to the application pipeline and defined data flow; and
providing, for use with the application pipeline, a recommended mapping of semantic actions between the source dataset and the target dataset, based on a profiling of the datasets, including:
performing a metadata analysis of the data received from the data source, and generating, by reference to the knowledge source, a profile of the data;
determining a candidate set of hubs and datasets for mapping, responsive to a search query;
comparing pairs of source and target datasets associated with the candidate set, based on profiling the data therein; and
providing, based on assessing a similarity between the source and target datasets, one or more recommended mappings of hubs and datasets to the user interface, reflective of a stage associated with the design or management of the application pipeline, including displaying within the user interface the one or more recommended mappings for inclusion within the dataflow software application and associated data flow.
|