US 11,860,971 B2
Anomaly detection
Teodora Buda, Dublin (IE); Hitham Ahmed Assem Aly Salama, Dublin (IE); Bora Caglayan, Dublin (IE); and Faisal Ghaffar, Castle Dunboyne (IE)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on May 24, 2018, as Appl. No. 15/988,506.
Prior Publication US 2019/0362245 A1, Nov. 28, 2019
Int. Cl. G06N 5/04 (2023.01); G06F 17/18 (2006.01); G06N 7/01 (2023.01)
CPC G06F 17/18 (2013.01) [G06N 5/04 (2013.01); G06N 7/01 (2023.01)] 22 Claims
OG exemplary drawing
 
1. A computer-implemented method for determining whether a data element, having a value, of a time-series dataset is an outlier, the method comprising:
obtaining prediction data, for predicting a value of the data element, from first data of the time-series dataset that temporally precedes the data element;
predicting, using the prediction data, a predicted value of the data element;
obtaining an error value for the data element representative of a difference between the value and the predicted value of the data element
obtaining historic error values for the time-series dataset, each historic error value being representative of a difference between a value and a predicted value of a second data element of the time-series dataset that temporally precedes the data element;
obtaining, based on one or more of the historic error values, a threshold value for the error value of the data element defining error values for the data element that are considered to be outliers, wherein obtaining the threshold value comprises:
determining a predetermined number based on a percentage of error values expected to be outliers;
multiplying a statistical measure of the historic error values by the predetermined number to produce a result, wherein the statistical measure includes one of a mean, median, mode, and standard deviation; and
determining the threshold value based on the result, wherein a different threshold value is determined for each data element of the time-series dataset; and
determining whether the data element is an outlier based on a comparison of the threshold value with the error value of the data element.