US 12,406,274 B2
Unsupervised apparatus and method for graphically clustering high dimensional patron clickstream data
Peter Councill, Richmond, VA (US)
Assigned to TRUIST BANK, Charlotte, NC (US)
Filed by Truist Bank, Charlotte, NC (US)
Filed on May 20, 2022, as Appl. No. 17/749,391.
Application 17/749,391 is a continuation of application No. 17/659,407, filed on Apr. 15, 2022.
Prior Publication US 2023/0334513 A1, Oct. 19, 2023
Int. Cl. G06Q 30/0201 (2023.01); G06F 18/23211 (2023.01); G06N 5/022 (2023.01)
CPC G06Q 30/0201 (2013.01) [G06F 18/23211 (2023.01); G06N 5/022 (2013.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus for improving efficiency of clickstream data processing comprising:
a processor; and
a memory including instructions that, when executed by the processor, cause the processor to:
extract the clickstream data;
transform the clickstream data into a probability matrix, wherein the clickstream data comprises a plurality of pages and the probability matrix comprises a plurality of entries, each entry of the probability matrix comprising a respective probability of proceeding from a first one of the plurality of pages to a second one of the plurality of pages;
transform the probability matrix into two dimensional data by reducing dimensionality of the probability matrix using a Uniform Manifold Approximation and Projection algorithm (UMAP);
generate, by a Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm based on the two dimensional data, a cluster graph visualizing a plurality of clusters of the two dimensional data;
determine, by the DBSCAN algorithm based on the two dimensional data, a respective center of each cluster of the plurality of clusters;
process, by a K-Nearest Neighbor (KNN) algorithm, the generated cluster graph and the determined centers of each cluster, to determine a respective subset of data points of the two dimensional data closest to the center of each cluster;
determine, based on the subsets of the data points of the two dimensional data determined by the KNN algorithm, a respective edge of each cluster;
shade, based on the determined edges of each cluster, each of the data points within each respective subset to graphically identify each cluster of the plurality of clusters in the cluster graph; and
output the cluster graph on a display.