CPC G06F 16/215 (2019.01) [G06F 16/2358 (2019.01); G06F 16/254 (2019.01); G06F 16/26 (2019.01)] | 17 Claims |
1. A system for providing a traceable end-to-end push-based pipeline and monitoring of data quality in extract, transform, load (ETL) environments, comprising:
providing, at a computer comprising a microprocessor,
an extract, transform, load (ETL) push-based pipeline that operates to extract data from one or more data sources, transform the data as needed, and load the transformed data into a data store, for subsequent usage;
a data layer that operates to trace one or more task's impact on a data level;
a task layer that operates, when the system runs a particular job, to define task sequence and dependency and which uses a task state table or event to trigger downstream jobs; and
an orchestrator that operates to build job orchestration rules and maintain a pipeline performance dashboard or visualization;
wherein the pipeline is associated with one or more jobs that comprises one or more tasks, wherein a task is associated with a program or process operating within or as part of a pipeline and serving a particular data function;
wherein the pipeline components are decoupled over multiple dimensions;
wherein the system maintains a table-of-tables or control table, which the system uses to trace task performance and detailed data and table changes as the pipeline executes.
|