US 11,720,586 B2
Automatic conversion of data within data pipeline
Kiran A Kate, Chappaqua, NY (US); Martin Hirzel, Ossining, NY (US); and Avraham Ever Shinnar, Westchester, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Mar. 26, 2021, as Appl. No. 17/213,716.
Prior Publication US 2022/0309073 A1, Sep. 29, 2022
Int. Cl. G06F 16/25 (2019.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06F 17/18 (2006.01)
CPC G06F 16/258 (2019.01) [G06N 5/04 (2013.01); G06N 20/00 (2019.01); G06F 17/18 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
analyzing identified data for a determined conversion of the identified data, wherein the identified data is input data stored on an external database;
identifying a plurality of relational factors within the analyzed data, wherein the plurality of relational factors are used to determine data joins of a first dataset and a second dataset of the analyzed data;
standardizing the analyzed data into a uniform domain using a relational algebra algorithm;
mapping a data route within the standardized data based on the plurality of relational factors, wherein the data route is predicted based on training data from previously mapped data routes, and wherein the data route comprises a plurality of transformers configured to reduce aggregated data by removing one or more datasets from the standardized data;
compressing the analyzed data based on the mapped data route, wherein the data joins of the first dataset and the second dataset are joined based on relationship factors within the mapped data route, and wherein the one or more aggregated datasets are removed from the analyzed data to create a compressed dataset;
converting the compressed dataset into the uniform domain using the relation algebra algorithm; and
dynamically transmitting the converted, compressed dataset into at least one section of a machine learning data pipeline.