| CPC G06N 5/02 (2013.01) [G06F 8/75 (2013.01); G06F 8/77 (2013.01); G06F 11/3608 (2013.01); G06N 20/00 (2019.01)] | 19 Claims |

|
1. A computer system comprising:
a processor set:
one or more computer readable storage media; and
program instructions stored on the one or more computer readable storage media to cause the processor set to perform operations comprising:
pre-processing a pipeline configured to train a machine learning (ML) model, the pipeline represented in a data flow graph (DFG), including:
annotating one or more nodes of the DFG with two or more operational semantics for pipeline operations; and
selectively annotating one or more output object references to a corresponding input object and a prior node state for an output object from the DFG;
executing the pipeline represented in the DFG with the selectively annotated one or more output object references, including capturing an object lineage to provide data of how an object in the pipeline is produced with respect to the output object using the selectively annotated one or more output object references;
identifying provenance of one or more objects represented in the pipeline corresponding to a generated output including performance of the executed pipeline using the object lineage;
selectively applying a remediation action to the DFG based on the provenance of the one or more objects corresponding to the generated output; and
restarting the pipeline by executing the pipeline from a location in a sub-graph of the DFG where the remediation action was selectively applied.
|