| CPC G06F 16/21 (2019.01) [G05B 13/0265 (2013.01); G06F 16/23 (2019.01); G06F 16/235 (2019.01); G06F 16/2379 (2019.01); G06F 16/2386 (2019.01); G06F 16/24564 (2019.01); G06F 16/2477 (2019.01); G06F 16/282 (2019.01); G06F 16/285 (2019.01); G06F 16/29 (2019.01); G06F 16/313 (2019.01); G06F 16/35 (2019.01); G06F 16/951 (2019.01); G06N 5/022 (2013.01); G06N 20/00 (2019.01); G06Q 10/101 (2013.01); G06Q 30/0261 (2013.01); G06Q 30/0282 (2013.01); G06Q 50/01 (2013.01); H04L 41/14 (2013.01); H04W 4/02 (2013.01); H04W 4/021 (2013.01); H04W 4/025 (2013.01); H04W 4/029 (2018.02); H04W 4/50 (2018.02); H04W 8/08 (2013.01); H04W 8/16 (2013.01); H04W 8/18 (2013.01); H04W 16/24 (2013.01); H04W 64/00 (2013.01); H04W 64/003 (2013.01); H04W 76/38 (2018.02); H04W 88/02 (2013.01); G06F 16/337 (2019.01); H04W 16/00 (2013.01); H04W 16/30 (2013.01); H04W 16/32 (2013.01); H04W 88/00 (2013.01)] | 18 Claims |

|
1. An apparatus comprising:
a processor configured to run one or more modules stored in memory, wherein the one or more modules are configured to:
receive a plurality of pairs of data records;
determine whether one or more pairs of data records are eligible to be clustered, the determination is based upon an analysis of a set of attributes included in the one or more pairs, wherein the analysis is performed by:
converting a first data record in the pair of data records into a first hash;
converting a second data record in the pair of data records into a second hash; and
comparing bits of the first hash and second hash; and
when it is determined that at least one pair of data records can be clustered:
determine a similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records, wherein the similarity value is determined using a machine learning model trained using a supervised learning function operating on ground-truth clusters of data records; and
associate the at least one pair of data records with one or more clusters, each associated with a unique entity, based on the similarity value for the at least one pair of data records.
|