US 12,189,588 B1
System and method for traceable push-based ETL pipeline and monitoring of data quality in ETL environments
Chunming Liu, Clyde Hill, WA (US); and Manasa Pola, Bellevue, WA (US)
Assigned to ORACLE INTERNATIONAL CORPORATION
Filed by ORACLE INTERNATIONAL CORPORATION, Redwood Shores, CA (US)
Filed on Aug. 21, 2023, as Appl. No. 18/236,214.
Int. Cl. G06F 16/215 (2019.01); G06F 16/23 (2019.01); G06F 16/25 (2019.01); G06F 16/26 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/2358 (2019.01); G06F 16/254 (2019.01); G06F 16/26 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A system for providing a traceable end-to-end push-based pipeline and monitoring of data quality in extract, transform, load (ETL) environments, comprising:
providing, at a computer comprising a microprocessor,
an extract, transform, load (ETL) push-based pipeline that operates to extract data from one or more data sources, transform the data as needed, and load the transformed data into a data store, for subsequent usage;
a data layer that operates to trace one or more task's impact on a data level;
a task layer that operates, when the system runs a particular job, to define task sequence and dependency and which uses a task state table or event to trigger downstream jobs; and
an orchestrator that operates to build job orchestration rules and maintain a pipeline performance dashboard or visualization;
wherein the pipeline is associated with one or more jobs that comprises one or more tasks, wherein a task is associated with a program or process operating within or as part of a pipeline and serving a particular data function;
wherein the pipeline components are decoupled over multiple dimensions;
wherein the system maintains a table-of-tables or control table, which the system uses to trace task performance and detailed data and table changes as the pipeline executes.