| CPC G06N 7/01 (2023.01) | 30 Claims |

|
1. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including system instructions operable to cause a computing device to:
obtain a first data set associated with a plurality of nodes to generate one or more sets of networks;
train a first model on the first data set using a first graph to predict relevant links between the plurality of nodes by executing operations comprising:
determine one or more features for one or more links between the plurality of nodes;
determine a target variable indicator for the one or more links between the plurality of nodes using the first graph by executing operations comprising:
determine a set of subgraphs from the first graph;
determine whether each of the one or more links between each node of the plurality of nodes connect within a single subgraph of the set of subgraphs from the first graph;
based on the determination of whether each of the one or more links between each node of the plurality of nodes connect within the single subgraph of the set of subgraphs from the first graph, label the one or more links as intra-community links in the single subgraph of the set of subgraphs from the first graph;
determine whether each of the one or more links between each node of the plurality of nodes connect between at least two subgraphs of the set of subgraphs from the first graph;
based on the determination of whether each of the one or more links between each node of the plurality of nodes connect between the at least two subgraphs of the set of subgraphs from the first graph, label the one or more links as inter-community links in the at least two subgraphs of the set of subgraphs from the first graph;
output the labeled one or more links as the intra-community links in the single subgraph of the set of subgraphs from the first graph; and
output the labeled one or more links as the inter-community links in the at least two subgraphs of the set of subgraphs from the first graph; and
based on the determination of the one or more features and the determination of the target variable indicator for the one or more links between the plurality of nodes using the first graph, train the first model to predict the relevant links of the one or more links between the plurality of nodes, wherein the relevant links comprise the intra-community links;
obtain the first data set or a second data set associated with the plurality of nodes;
determine a first node for the first data set or the second data set associated with the plurality of nodes;
(A) from the first node from the first data set or the second data set associated with the plurality of nodes, execute operations comprising:
(B) determine, for the first node, the one or more features for the one or more links between the plurality of nodes connected to the first node;
(C) based on the determination of the one or more features for the one or more links between the plurality of nodes connected to the first node, apply the trained first model to the one or more links between the plurality of nodes from the first node;
(D) based on the application of the trained first model to the one or more links between the plurality of nodes from the first node, output the relevant links and non-relevant links of the one or more links between the plurality of nodes from the first node and output a trained model variable, wherein the non-relevant links comprise the inter-community links;
(E) based on the output of the relevant links and the non-relevant links of the one or more links between the plurality of nodes from the first node, connect the first node to each node of the plurality of nodes for the relevant links in one or more first sets of generated networks;
based on the output of the trained model variable, optimize the application of the trained first model to the one or more links between the plurality of nodes by automatically computing a first threshold for the trained model variable for one or more factors including network size; and
for each node of the plurality of nodes connected to the first node based on the relevant links, repeat (A) to (E) to iteratively connect each node from the plurality of nodes to the one or more first sets of generated networks for each of the relevant links until the relevant links for connection to the plurality of nodes are not present; and
output the one or more first sets of generated networks in a first graphical user interface, as a first input to an automated analytical process, or as a first input to an investigative system.
|