US 11,657,050 B1
Data reconciliation for big data environments
Yifei Hong, Livingston, NJ (US)
Assigned to Bank of America Corporation, Charlotte, NC (US)
Filed by Bank of America Corporation, Charlotte, NC (US)
Filed on Dec. 16, 2021, as Appl. No. 17/552,439.
Int. Cl. G06F 16/245 (2019.01); G06F 16/2453 (2019.01); G06F 16/242 (2019.01)
CPC G06F 16/24537 (2019.01) [G06F 16/2425 (2019.01)] 15 Claims
OG exemplary drawing
 
1. A method for performing data reconciliation in a big data environment, the method comprising:
receiving identification of a first big data set and identification of a second big data set;
identifying a first set of metadata associated with the first big data set, the first set of metadata comprising schema information and data type information of the first big data set;
identifying a second set of metadata associated with the second big data set, the second set of metadata comprising a set of schema information and a set of data type information of the second big data set;
identifying a first subset of metadata within the first set of metadata and a second subset of metadata within the second set of metadata, the first subset corresponding to the second subset of metadata;
generating a dimension, based on the first subset of metadata, that corresponds to the second subset of metadata, said dimension including a first set of fields from the first big data set and a second set of fields from the second big data set, wherein the first set of fields corresponds to the second set of fields;
dynamically constructing a structured query language (SQL) query to compare the first big data set to the second big data set, said constructing using the dimension as a join set of data fields within the SQL query, the SQL query further comprising a where clause, the where clause including each of a set of column names in the first big data set that correspond to a second set of column names in the second big data set;
executing the SQL query on the first big data set and the second big data set; and
generating a result set from the SQL query.