CPC G06F 16/285 (2019.01) [G06F 16/288 (2019.01); G06F 18/2148 (2023.01); G06F 18/22 (2023.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A method performed by at least one processor, the method comprising:
obtaining pairs of training records for training an entity resolution model, wherein the pairs of training records comprise a first pair of training records, wherein the first pair of training records comprises a first record that indicates a first set of values for a first attribute and a second record that indicates a second set of values for the first attribute;
determining a set of association metrics corresponding to the pairs of training records at least by:
identifying a first value of the first set of values;
determining a first set of individual association metrics corresponding respectively to comparisons between the first value of the first set of values and each value of the second set of values;
executing a first-level reduction operation for the first set of individual association metrics, across the second set of values, to generate a first reduced association metric;
storing the first reduced association metric in a set of reduced association metrics;
identifying a second value of the first set of values;
determining a second set of individual associate metrics corresponding respectively to comparisons between the second value of the first set of values and each value of the second set of values;
executing the first-level reduction operation for the second set of individual association metrics, across the second set of values, to generate a second reduced association metric;
storing the second reduced association metric in the set of reduced association metrics;
excluding the first value and the second value, determining a presence of any more values in the first set of values;
based on determining that no more values are present in the first set of values, executing a second-level reduction operation for the set of reduced association metrics across the first set of values to generate a set of association metrics corresponding to the first pair of training records; and
applying a machine learning algorithm to the set of association metrics corresponding to the pairs of training records to train the entity resolution model.
|