US 12,013,840 B2
Dynamic discovery and correction of data quality issues
Shrey Shrivastava, White Plains, NY (US); Anuradha Bhamidipaty, Yorktown Heights, NY (US); and Dhavalkumar C. Patel, White Plains, NY (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Oct. 20, 2020, as Appl. No. 17/075,617.
Claims priority of provisional application 63/011,990, filed on Apr. 17, 2020.
Prior Publication US 2021/0326334 A1, Oct. 21, 2021
Int. Cl. G06F 16/23 (2019.01)
CPC G06F 16/2379 (2019.01) [G06F 16/2365 (2019.01)] 25 Claims
OG exemplary drawing
 
1. A computing device comprising:
a processor;
a storage device coupled to the processor;
an engine stored in the storage device, wherein an execution of the engine by the processor configures the computing device to perform acts comprising:
receiving a raw dataset;
receiving one or more data quality metric goals, corresponding to the received raw dataset;
determining a schema of the dataset;
identifying an initial set of validation nodes based on the schema of the dataset, wherein each validation node includes a data quality check;
executing the initial set of validation nodes;
iteratively expanding and executing a next set of validation nodes based on the schema of the dataset until a termination criterion including a number of successful tests reaching a predetermined threshold, is achieved; and
providing a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes.