US 11,693,836 B2
System, method, and computer program product for monitoring and improving data quality
Chiranjeet Chetia, Round Rock, TX (US); Punit Kumar Rajgarhia, San Francisco, CA (US); Hangqi Zhao, Seattle, WA (US); Claudia Carolina Barcenas Cardenas, Austin, TX (US); and Jianhua Huang, Cedar Park, TX (US)
Assigned to Visa International Service Association, San Francisco, CA (US)
Filed by Visa International Service Association, San Francisco, CA (US)
Filed on Jul. 13, 2020, as Appl. No. 16/927,593.
Application 16/927,593 is a continuation in part of application No. 16/742,463, filed on Jan. 14, 2020, abandoned.
Claims priority of provisional application 62/960,917, filed on Jan. 14, 2020.
Claims priority of provisional application 62/792,165, filed on Jan. 14, 2019.
Prior Publication US 2020/0341954 A1, Oct. 29, 2020
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01); G06F 40/205 (2020.01); G06F 16/906 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/906 (2019.01); G06F 40/205 (2020.01)] 19 Claims
OG exemplary drawing
 
1. A computer-implemented method for monitoring and improving data quality of transaction data, comprising:
receiving, with at least one processor, transaction data associated with a plurality of payment transactions from an acquirer system, the transaction data comprising a transaction record associated with each payment transaction of the plurality of payment transactions, each transaction record comprising a plurality of data fields;
categorizing, with at least one processor, each respective data field of the plurality of data fields into a respective type of a plurality of types; and
determining, with at least one processor, a data quality score for each respective data field of the plurality of data fields based on the respective type of the respective data field, wherein a first subset of data fields of the plurality of data fields comprises numeric data fields, and wherein determining the data quality score for the numeric data fields comprises:
generating, with at least one processor, a vector comprising a plurality of elements, the plurality of elements comprising an element for each value of the numeric data fields and for each interaction of at least two values of the numeric data fields;
generating, with at least one processor, a plurality of tuples based on the plurality of elements;
performing, with at least one processor, a regression on each tuple of the plurality of tuples to provide an error value for each tuple of the plurality of tuples;
determining, with at least one processor, the error value of at least one tuple of the plurality of tuples satisfies a data quality threshold; and
storing, with at least one processor, at least one of a coefficient value, an intercept value, or any combination thereof based on the regression for the at least one tuple that satisfies the data quality threshold.