US 11,914,621 B2
Determining an association metric for record attributes associated with cardinalities that are not necessarily the same for training and applying an entity resolution model
Benjamin James Campbell Blalock, Astoria, NY (US); Alexander Graham Glenday, Brooklyn, NY (US); and Jason Richard Prestinario, Brooklyn, CA (US)
Assigned to KOMODO HEALTH, San Francisco, CA (US)
Filed by KOMODO HEALTH, San Francisco, CA (US)
Filed on May 8, 2019, as Appl. No. 16/406,267.
Prior Publication US 2020/0356816 A1, Nov. 12, 2020
Int. Cl. G06F 16/28 (2019.01); G06N 20/00 (2019.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01)
CPC G06F 16/285 (2019.01) [G06F 16/288 (2019.01); G06F 18/2148 (2023.01); G06F 18/22 (2023.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method performed by at least one processor, the method comprising:
obtaining pairs of training records for training an entity resolution model, wherein the pairs of training records comprise a first pair of training records, wherein the first pair of training records comprises a first record that indicates a first set of values for a first attribute and a second record that indicates a second set of values for the first attribute;
determining a set of association metrics corresponding to the pairs of training records at least by:
identifying a first value of the first set of values;
determining a first set of individual association metrics corresponding respectively to comparisons between the first value of the first set of values and each value of the second set of values;
executing a first-level reduction operation for the first set of individual association metrics, across the second set of values, to generate a first reduced association metric;
storing the first reduced association metric in a set of reduced association metrics;
identifying a second value of the first set of values;
determining a second set of individual associate metrics corresponding respectively to comparisons between the second value of the first set of values and each value of the second set of values;
executing the first-level reduction operation for the second set of individual association metrics, across the second set of values, to generate a second reduced association metric;
storing the second reduced association metric in the set of reduced association metrics;
excluding the first value and the second value, determining a presence of any more values in the first set of values;
based on determining that no more values are present in the first set of values, executing a second-level reduction operation for the set of reduced association metrics across the first set of values to generate a set of association metrics corresponding to the first pair of training records; and
applying a machine learning algorithm to the set of association metrics corresponding to the pairs of training records to train the entity resolution model.