CPC G06F 16/2379 (2019.01) [G06F 16/2365 (2019.01)] | 25 Claims |
1. A computing device comprising:
a processor;
a storage device coupled to the processor;
an engine stored in the storage device, wherein an execution of the engine by the processor configures the computing device to perform acts comprising:
receiving a raw dataset;
receiving one or more data quality metric goals, corresponding to the received raw dataset;
determining a schema of the dataset;
identifying an initial set of validation nodes based on the schema of the dataset, wherein each validation node includes a data quality check;
executing the initial set of validation nodes;
iteratively expanding and executing a next set of validation nodes based on the schema of the dataset until a termination criterion including a number of successful tests reaching a predetermined threshold, is achieved; and
providing a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes.
|