US 11,989,199 B2
Optimizing flow of data within ETL data processing pipeline
Srinivas Mudigonda, Hyderabad (IN); Syam Dulla, Hyderabad (IN); Namit Kabra, Hyderabad (IN); and Alekhya Telekicherla, Pennant Hills (AU)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Sep. 8, 2021, as Appl. No. 17/469,679.
Prior Publication US 2023/0074414 A1, Mar. 9, 2023
Int. Cl. G06F 16/25 (2019.01); G06F 16/21 (2019.01); G06F 16/22 (2019.01)
CPC G06F 16/254 (2019.01) [G06F 16/211 (2019.01); G06F 16/221 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for optimizing a flow of data within extract, transform, load (ETL) data processing pipelines, the method comprising:
identifying which database columns from a source database are to be transformed in data processing stages of a processing segment of a ETL data processing pipeline and which database columns from said source database are not to be transformed in said data processing stages of said processing segment of said ETL data processing pipeline;
grouping database columns to be transformed into a processing schema;
performing transformations on said database columns of said processing schema;
grouping database columns that are not be transformed into a non-processing schema;
creating a large object data type to reference said non-processing schema; and
creating and inserting an identifier in said data processing stages to identify said large object data type thereby avoiding copying of said database columns that are not to be transformed in said data processing stages.