CPC G06F 16/24578 (2019.01) [G06F 16/285 (2019.01); G06F 16/35 (2019.01); G06F 16/9535 (2019.01); G06F 18/23 (2023.01)] | 20 Claims |
1. A computer-implemented method comprising:
generating a plurality of record pairs, wherein each record pair in the plurality of record pairs comprises a respective first record from a first plurality of records and a respective second record from a second plurality of records;
applying a machine learning model to determine respective probabilities, for each of the plurality of record pairs, that the respective first record and second record of the respective record pairs are associated with a respective same entity;
causing a client computing device to present any indeterminate record pairs to a user, wherein indeterminate record pairs are identified based at least in part on the respective determined probabilities for individual record pairs of the plurality of record pairs being below a pre-established threshold;
receiving, from the client computing device, user feedback indicating whether the first and second record of an indeterminate record pair are associated with the same entity;
retraining the machine learning model and revising the probability of the indeterminate record pair based at least in part on the user feedback;
determining, based at least in part on the probabilities, respective entities associated with one or more clusters of record pairs; and
outputting the clusters of record pairs and the respective entities associated with each cluster to the client computing device.
|