| CPC G06F 16/215 (2019.01) [G06F 16/906 (2019.01); G06F 40/205 (2020.01)] | 18 Claims |

|
1. A computer-implemented method for monitoring and improving data quality of transaction data, comprising:
receiving, with at least one processor, transaction data associated with a plurality of payment transactions from an acquirer system, the transaction data comprising a transaction record associated with each payment transaction of the plurality of payment transactions, each transaction record comprising a plurality of data fields;
categorizing, with at least one processor, each respective data field of the plurality of data fields into a respective type of a plurality of types; and
determining, with at least one processor, a data quality score for each respective data field of the plurality of data fields based on the respective type of the respective data field, wherein a first data field of the plurality of data fields comprises a textual data field, and wherein determining the data quality score for the textual data field comprises:
conducting, with at least one processor, data pre-processing on the transaction data;
determining, with at least one processor, feature values associated with the textual data field in each transaction record, wherein the feature values are used in a parsing layer of a natural language processing (NLP) model after conducting data pre-processing on the transaction data;
determining, with at least one processor, whether the feature values associated with the textual data field satisfy at least one rule associated with the parsing layer of the NLP model; and
determining, with at least one processor, the data quality score for the textual data field included in the transaction data based on determining whether the feature values associated with the textual data field satisfy the at least one rule associated with the parsing layer of the NLP model,
wherein the plurality of types comprises a date type, a categorical type, an identifier type, a textual type, and a numeric type,
wherein categorizing comprises categorizing each respective data field of the plurality of data fields into one of the date type, the categorical type, the identifier type, the textual type, or the numeric type, and
wherein categorizing comprises, for each respective data field:
categorizing the respective data field into the date type if data contained in the respective data field at least one of: is formatted as at least one of a standard date format, a standard time format, a standard date and time format, or any combination thereof; satisfies at least one of a date function, a time function, a date and time function, or any combination thereof; or any combination thereof;
categorizing the respective data field into the categorical type based on a statistical distribution of values in the data contained in the respective data field and a threshold of unique values;
categorizing the respective data field into the identifier type based on a degree of uniqueness of the values in the data contained in the respective data field;
categorizing the respective data field into the textual type based on at least one of a plurality of regular expression functions; a number of combinations of punctuation, alphabetical characters, and digits of the data contained in the respective data field; or any combination thereof; and
categorizing the respective data field into the numeric type if the data contained in the respective data field includes only digits and up to one decimal point.
|