US 11,755,932 B2
Online unsupervised anomaly detection
Danny Butvinik, Haifa (IL)
Assigned to Actimize LTD., Ra'anana (IL)
Filed by Actimize LTD., Ra'anana (IL)
Filed on Apr. 23, 2020, as Appl. No. 16/856,071.
Prior Publication US 2021/0334673 A1, Oct. 28, 2021
Int. Cl. G06Q 30/00 (2023.01); G06N 5/04 (2023.01)
CPC G06N 5/04 (2013.01) 15 Claims
OG exemplary drawing
1. A computerized-method for real-time detection of anomalous data, by processing high-speed streaming data, said computerized-method comprising:
in a computerized-system comprising a processor and a memory, receiving by the processor, a data stream comprised of unlabeled data points,
operating by the processor an Anomalous Data Detection (ADD) module, said ADD module is configured to:
(i) receive: k, X, d, threshold, and n,
wherein k is a number of data point neighbors for each data point,
wherein X is a number of data points in a predetermined period of time,
wherein d is a number of dimensions of each data point,
wherein n is a number of data points that said ADD module is operating on, in a predefined time unit;
(ii) prepare a dataset having n data points from the received X data points; and
(iii) identify one or more data points, from the received data stream, as outliers to send an alert with details related to the identified outliers thus, dynamically evaluating local outliers in the received data stream,
wherein the reparation of the dataset is comprising:
(ii.a) fetching X data points from a data storage device, according to at least one preconfigured criterion;
(ii.b) retrieving random n data points from the retrieved X data points to yield a dataset;
(ii.c) for each data point in the dataset:
ii.c.i. applying at least one classification algorithm to yield a set of results from each applied classification algorithm and to determine k data points neighbors;
ii.c.ii marking the data points in the set of results as related to the dataset;
ii.c.iii calculating a local density, wherein said local density is corresponding to a calculated heterogenous nearest neighbors Local Outlier Factor (hLOF) area of the data point;
ii.c.iv associating an outlier counter and zeroing said outlier counter; and
ii.c.v marking said data point as a potential-outlier by increasing by 1 each associated outlier counter that its calculated local density is higher than 1.