US 11,687,574 B2
Record matching in a database system
Lars Bremer, Boeblingen (DE); Martin Oberhofer, Sindelfingen (DE); Karin Steckler, Herrenberg (DE); Mariya Chkalova, Stuttgart (DE); Michael Baessler, Bempflingen (DE); and Holger Koenig, Boblingen (DE)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Mar. 29, 2021, as Appl. No. 17/215,071.
Prior Publication US 2022/0309084 A1, Sep. 29, 2022
Int. Cl. G06F 16/332 (2019.01); G06F 16/335 (2019.01); G06F 16/33 (2019.01); G06F 16/338 (2019.01)
CPC G06F 16/3322 (2019.01) [G06F 16/334 (2019.01); G06F 16/335 (2019.01); G06F 16/338 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer implemented method for record matching in a database system, the method comprising:
identifying records representing respective entities, wherein a record of the identified records comprises structured attributes;
assigning an initial contribution weight to the structured attributes;
identifying one or more unstructured data objects corresponding to the records;
processing the one or more unstructured data objects to identify unstructured attribute values corresponding to respective records of the identified records;
identifying entity relation scores corresponding to the identified records, wherein an entity relation score indicates how often an entity represented by a record occurs alongside a selected entity;
comparing two records based, at least in part, on the updated contribution weight of the selected structured attribute and a comparison of the entity relation scores and the unstructured attribute values of the two records to determine a similarity level between the two records;
selecting unstructured attribute values that are present with respect to the identified records; and
responsive to determining a structured attribute value of the structured attribute values does not match any of the selected unstructured attributes, replacing the contribution weight of said structured attribute by an updated contribution weight indicative of the similarity between the two records of the identified records.