US 12,149,613 B2
Data validation techniques for sensitive data migration across multiple platforms
Rohit Singh, Devon, GA (US); Pinaki Ghosh, Cumming, GA (US); and Joji Varughese, Alpharetta, GA (US)
Assigned to Equifax Inc., Atlanta, GA (US)
Filed by EQUIFAX INC., Atlanta, GA (US)
Filed on Dec. 8, 2021, as Appl. No. 17/643,302.
Prior Publication US 2023/0179401 A1, Jun. 8, 2023
Int. Cl. H04L 9/08 (2006.01); H04L 9/06 (2006.01); H04L 67/06 (2022.01); H04L 67/1097 (2022.01)
CPC H04L 9/0825 (2013.01) [H04L 9/0643 (2013.01); H04L 67/06 (2013.01); H04L 67/1097 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method that includes one or more processing devices performing operations comprising:
transforming data in a first data file stored on a first platform to common data formats, the first data file comprising a first plurality of data records and each data record of the first plurality of data records comprising a plurality of attributes in a first order, wherein the common data formats are associated with data in a second data file stored on a second platform, the second data file comprising a second plurality of data records and each data record of the second plurality of data records comprising the plurality of attributes in a second order;
reordering the data in the first plurality of data records by ordering the data according to the second order of the plurality of attributes based on a determination that the first order of the plurality of attributes differs from the second order of the plurality of attributes;
identifying a first set of values of a primary key for the first plurality of data records;
generating a first set of hash values, each hash value of the first set of hash values generated by applying a hash function on a data record of the first plurality of data records;
receiving a second set of hash values for the second plurality of data records in the second data file stored on the second platform along with a second set of values of the primary key associated with the second set of hash values, each data record of the second plurality of data records comprising the plurality of attributes;
comparing the first set of hash values and the second set of hash values according to the first set of values and the second set of values of the primary key;
determining a location of a mismatch in the first data file stored on the first platform or the second data file stored on the second platform, wherein determining the location of the mismatch comprises:
for each attribute of the plurality of attributes:
partitioning the values in the first plurality of data records for the attribute into multiple chunks,
applying a third hash function on each chunk of each attribute to generate multiple hash values, and
applying a second hash function on each chunk of the multiple chunks to generate a hash value of a third set of hash values, and
comparing the third set of hash values with a fourth set of hash values for the data file to identify the location of the mismatch between first values of the plurality of attributes of the first data file and second values of the plurality of attributes of the second data file; and
causing the first data file stored on the first platform or the second data file stored on the second platform to be modified at the location of the mismatch based on the first set of hash values being different from the second set of hash values.