| CPC G06N 3/088 (2013.01) [G06F 16/285 (2019.01); G06F 18/2155 (2023.01); G06F 18/2178 (2023.01); G06F 18/23213 (2023.01)] | 14 Claims |

|
1. A system for training a machine learning algorithm configured to identify connections between individuals based on clusters in data contained in a database, comprising:
a database housing data;
a processor operably coupled with the database;
a memory device storing computer-executable instructions that when executed cause the processor to:
collect a set of transaction records from the database;
create a first training set comprising the set of transaction records, a set of individual connections, and a set of individual non-connections;
train the machine learning algorithm in a first stage using the first training set based on clusters in the data contained in the database, where the machine learning algorithm provides output data identifying clusters of activity relationships, a group label for each cluster when known, and cluster strength values for each of the individuals for each of the clusters in which they appear;
create a second training set for a second stage of training comprising the first training set and individual non-connections that are incorrectly detected as connections after the first stage of training; and
train the machine learning algorithm in a second stage using the second training set,
wherein the transaction records include transaction data and the clusters of activity relationships include clusters of transaction commonalities, clusters of social affiliation commonalities, and clusters with combinations of transaction and social affiliations commonalities,
and where the machine learning algorithm is initially trained via unsupervised learning using the transaction data in an unlabeled form, and the machine learning algorithm is periodically provided with update training via supervised learning wherein at least some of the clusters identified in the output data have been assigned a group label by a human analyst and the transaction data with group labels is used as a training dataset for the supervised learning,
and further comprising a supplemental communication system operated by the human analyst reviewing the output data, where the supplemental communication system is used by the human analyst to send actionable communications to particular ones of the individuals based on the output data.
|