US 11,681,689 B2
Automatic generation of a matching algorithm in master data management
Abhishek Seth, Deoband (IN); Soma Shekar Naganna, Bangalore (IN); James Albert O'Neill, Jr., Austin, TX (US); Geetha Sravanthi Pulipaty, Bangalore (IN); and Neeraj Ramkrishna Singh, Bangalore (IN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jun. 9, 2021, as Appl. No. 17/342,895.
Prior Publication US 2022/0398241 A1, Dec. 15, 2022
Int. Cl. G06F 16/23 (2019.01); G06F 16/22 (2019.01)
CPC G06F 16/2379 (2019.01) [G06F 16/2255 (2019.01); G06F 16/2272 (2019.01)] 12 Claims
OG exemplary drawing
 
1. A computer-implemented method (CIM) for use in a master data management (MDM) environment including a plurality of master data records stored in a set of storage device(s), the CIM comprising:
receiving a plurality of additional records that include data that is to be incorporated into the plurality of master data records;
automatically generating a full matching algorithm to determine whether subject matter of data of each additional record matches any of the records of the plurality of master data records, with the full matching algorithm including code for performing the following operations:
determining a record type for the additional records using classifiers and an internal domain knowledge corpus,
calculating a Jaccard coefficient for a plurality of candidate lists,
assigning the each additional data record to a match set based on completeness and similarity of natures of attributes of the additional record, and
for each given additional record of the plurality of additional records, assigning the given additional record to a comparison group based on completeness and similarity of natures of attributes of the given additional data record;
applying the full matching algorithm to determine whether each additional record matches an existing master data record; and
for additional data records that match an existing master data record, merging the matching additional data record with its matching master data record in the set of storage device(s) to generate an updated version of the matching master data record that includes data from the matching additional record.