| CPC G06F 18/22 (2023.01) [G06F 18/214 (2023.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)] | 17 Claims |

|
1. A data pipeline monitoring system configured to monitor operations of a data pipeline, the data pipeline monitoring system comprising:
data processing circuitry configured to receive a training data set and process the training data set;
the data processing circuitry configured to identify a data type, data format, and data value range of the training data set based on the processing;
the data processing circuitry configured to determine an average throughput and entropy for the data pipeline;
the data processing circuitry configured to receive data configuration rules that indicate a preferred data format;
the data processing circuitry configured to generate a data standard that indicates at least the preferred data format based on the data type, data format, and data value range of the training data set, the average throughput and entropy for the data pipeline, and the data configuration rules that indicates the preferred data format;
the data processing circuitry configured to receive an output data set from the data pipeline wherein the data pipeline receives an input data set, processes the input data set, responsively generates the output data set, and transfers the output data set to the data processing circuitry; and
the data processing circuitry configured to determine similarities between the output data set and the data standard, score the output data set based on the similarity between the output data set and the data standard, and report the score for the output data set.
|