US 12,117,978 B2
Remediation of data quality issues in computer databases
Amaranatha Reddy Molakantalla, Bangalore (IN); Girish B Mohite, Bangalore (IN); Krishna Sumanth Gummadi, Bangalore (IN); Raj Kumar, Bangalore (IN); and Nitik Kumar, Bangalore (IN)
Assigned to Kyndryl, Inc., New York, NY (US)
Filed by KYNDRYL, INC., New York, NY (US)
Filed on Dec. 9, 2020, as Appl. No. 17/116,279.
Prior Publication US 2022/0179835 A1, Jun. 9, 2022
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01); G06F 16/23 (2019.01); G06F 16/242 (2019.01); G06F 16/2457 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/2322 (2019.01); G06F 16/2433 (2019.01); G06F 16/24573 (2019.01); G06F 16/287 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method to remediate data quality issues in a consolidated data record that includes elements from a plurality of datasets, comprising: sourcing a first dataset from a data lake and a second dataset from the data lake; wherein sets of metadata respectively associated with the datasets are received with the first dataset and the second dataset; directly measuring, by said computer, based at least in part on dataset feed information, a first dataset flow quality value for the first dataset and a second dataset flow quality value for the second dataset for a consolidated data record resulting from combining a plurality of datasets, wherein the first dataset and the second dataset are based on incident tickets in a customer service troubleshooting use case; determining a predetermined dataset flow quality threshold value from the received metadata; responsive to said dataset flow quality value determinations, comparing, by said computer, said dataset flow quality values with the predetermined dataset flow quality threshold value and, in response, conducting, by said computer, a consolidated data record dataset flow quality correction action for datasets having a dataset flow quality value beyond said dataset flow quality threshold value until the datasets having a dataset flow quality value beyond said dataset flow quality threshold are corrected to have dataset flow quality values not beyond the dataset flow quality threshold value; responsive to a determination that all of the first dataset and the second dataset have and/or are corrected to have dataset flow quality values not beyond the dataset flow quality threshold value, determining, by said computer, based at least in part on dataset element health information, a plurality of first dataset health quality values for said first dataset and a plurality of second dataset health quality values for said second dataset, wherein the first dataset health quality values are based on a different type of health quality attribute, wherein the second dataset health quality values are based on different types of health_quality attributes, wherein the first dataset health quality values are determined by computing a percentage of the records of the first dataset that match boundaries of a range of first expected health quality attribute values (HQAVs), wherein the second dataset health quality values are determined by computing a percentage of the records of the second dataset that match boundaries of a range of second expected HQAVs; determining a plurality of acceptable dataset health quality threshold values from the received metadata; and responsive to said dataset health quality value determinations, comparing, by said computer, said health quality values with the determined acceptable dataset health quality threshold values and, in response, conducting, by said computer, a dataset health quality correction action for datasets having at least one dataset health quality value beyond an associated one of the determined acceptable dataset health quality threshold values that the at least one dataset health quality value is compared with.