US 12,242,514 B2
Multi-level conflict-free entity clusters
Yan Yan, Seattle, WA (US); Stephen Keith Meyles, Seattle, WA (US); Graeme Andrew Kyle Roche, Seattle, WA (US); Jeffrey Allen Stokes, Seattle, WA (US); Carlos Minoru Sakoda, Seattle, WA (US); and Dan Suciu, Seattle, WA (US)
Assigned to AMPERITY, INC., Seattle, WA (US)
Filed by AMPERITY, INC., Seattle, WA (US)
Filed on May 10, 2021, as Appl. No. 17/316,293.
Application 17/316,293 is a continuation of application No. 16/399,162, filed on Apr. 30, 2019, granted, now 11,003,643.
Prior Publication US 2021/0263903 A1, Aug. 26, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/28 (2019.01); G06F 16/22 (2019.01); G06F 18/23 (2023.01); G06F 40/177 (2020.01); G06F 40/18 (2020.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 16/221 (2019.01); G06F 16/282 (2019.01); G06F 18/23 (2023.01); G06F 40/177 (2020.01); G06F 40/18 (2020.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method comprising:
performing pairwise comparisons on a set of records, the pairwise comparisons comprising, for a given record, comparing the given record to other records in the set of records;
generating feature signatures associated with each of the pairwise comparisons, a given feature signature comprising a vector representing a likelihood that two respective records associated with the feature signature are relate to a single entity;
inputting the feature signatures into a trained ordinal classifier to obtain a first set of match scores predicted by the trained ordinal classifier wherein the trained ordinal classifier is configured using ordinal training data and hard conflict rules, and wherein the ordinal classifier generates non-binary output labels indicating at least one of a strong match, a moderate match, a weak match, an unknown match, and a hard conflict;
generating, based on the first set of match scores, a first cluster of records and second cluster of records;
inputting the first cluster of records and the second cluster of records into the ordinal classifier to obtain a second set of match scores;
determining whether a hard conflict exists between the first cluster of records and the second cluster of records based on the second set of match scores;
generating a hierarchical clustering based on the first set of match scores, second set of match scores, and the determination of whether a hard conflict exists;
assigning hierarchical cluster identifiers to records in the set of records based on the hierarchical clustering, wherein a hierarchical cluster identifier for a given record comprises a series of values, each value reflecting a respective tier within the hierarchical clustering; and
generating a processed database table with the hierarchical cluster identifiers, wherein the hierarchical cluster identifiers allow selection of clusters according to different degrees of confidence.