US 12,353,375 B2
Automated selection and ordering of data quality rules during data ingestion
Akshar Kaul, Bangalore (IN); Hima Patel, Bengaluru (IN); and Shanmukha Chaitanya Guttula, Vijayawada (IN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Nov. 9, 2023, as Appl. No. 18/505,942.
Prior Publication US 2025/0156385 A1, May 15, 2025
Int. Cl. G06F 11/00 (2006.01); G06F 16/215 (2019.01)
CPC G06F 16/215 (2019.01) 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
generating a snapshot of a table-formatted dataset, wherein the snapshot provides a sample comprising a reduced number of rows of the table-formatted dataset such that each column variation of the table-formatted dataset is included in the snapshot;
executing a predetermined collection of data quality (DQ) rules on the snapshot;
determining one or more performance statistics for each of the DQ rules, wherein the performance statistics indicate a likelihood that a DQ rule determines a data quality deficiency;
generating, based on the performance statistics, a subset of the DQ rules, wherein each DQ rule of the subset is selected based on the likelihood that the DQ rule selected detects a quality deficiency; and
generating an order of executing the subset of DQ rules selected, wherein the order generated specifies a sequence for applying each DQ rule of the subset to the table-formatted dataset.