| CPC G06N 20/20 (2019.01) [G06F 16/215 (2019.01)] | 20 Claims |

|
1. A method for improving data quality in a first dataset, comprising:
receiving the first dataset;
applying a machine-learning algorithm to the first dataset, wherein the machine-learning algorithm identifies at least one relationship between a first data column and a second data column in the first dataset;
based on the identification of the at least one relationship between the first data column and the second data column in the first dataset, generating a second dataset, wherein the second dataset is a subset of the first dataset;
concatenating a plurality of column headers in the second dataset to obtain an itemset;
computing a probability matrix of itemset combinations;
based on the probability matrix of itemset combinations, identifying at least one frequency value associated with the at least one relationship between the first data column and the second data column in the first dataset; and
improving the data quality in the first dataset by identifying at least one anomaly in the first dataset based on the at least one frequency value associated with the at least one relationship.
|