US 12,353,966 B2
Spectral clustering of high-dimensional data
Vasileios Kalantzis, White Plains, NY (US); and Lior Horesh, North Salem, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jul. 23, 2021, as Appl. No. 17/384,197.
Prior Publication US 2023/0045753 A1, Feb. 9, 2023
Int. Cl. G06N 20/00 (2019.01); G06F 16/901 (2019.01); G16H 50/70 (2018.01)
CPC G06N 20/00 (2019.01) [G06F 16/9024 (2019.01); G16H 50/70 (2018.01)] 6 Claims
OG exemplary drawing
 
1. A machine learning system comprising:
a processor;
a memory device coupled with the processor;
a sensor coupled with the processor;
the processor configured at least to:
receive data from the sensor;
create graph Laplacian of the data and store in the memory device;
compute spectral characteristic by applying density of states and detect spectral gaps in an unsupervised manner in the spectral characteristic to determine r number of clusters, r being a hyper-parameter for machine learning;
compute a range space of a rational matrix of the graph Laplacian, r being determined based on dynamically increasing projection subspace for capturing eigenvalues, in which the subspace is built by computing the range space, and without requiring estimation of the eigenvalues located inside a disk in a complex plane;
train an unsupervised machine learning model based on the hyper-parameter r to cluster the received data, wherein to train the unsupervised machine learning model, the processor is configured to perform K-means clustering on the range space of rational matrix of the graph Laplacian using r as the number of clusters, the K-means clustering trained to return r clusters of the received data;
the data including computer network traffic data, wherein the unsupervised machine learning model is trained to classify the computer network traffic data into r clusters of security levels; and
run the trained unsupervised machine learning model on incoming data traffic in real-time to detect a security level cluster to which the incoming data traffic belongs, and based on the security level cluster, filter the incoming data traffic to prevent unwanted data from entering a target computer system.