CPC G06N 3/08 (2013.01) [G06F 11/0727 (2013.01); G06F 11/079 (2013.01); G06N 3/04 (2013.01)] | 20 Claims |
1. A method comprising:
training, based on a plurality of timeseries, a plurality of anomaly detectors, wherein:
each anomaly detector in the plurality of anomaly detectors is configured with a respective distinct contamination factor,
each timeseries in the plurality of timeseries comprises a temporal sequence of datapoints that characterize a device, and
each datapoint in the plurality of timeseries comprises a respective label that indicates whether the device failed when the datapoint occurred;
detecting, by each anomaly detector of the plurality of anomaly detectors after said training:
a plurality of anomalous datapoints in the plurality of timeseries, wherein a size of the plurality of anomalous datapoints is proportional to said contamination factor of the anomaly detector,
a respective healthy count of the plurality of anomalous datapoints in timeseries not containing a datapoint whose label indicates the device failed, and
a respective unhealthy count of the plurality of anomalous datapoints in timeseries containing a datapoint whose label indicates the device failed;
detecting, for a particular anomaly detector of the plurality of anomaly detectors, that a magnitude of difference between the respective healthy count and the respective unhealthy count is less than a threshold;
oversampling, based on said contamination factor of the particular anomaly detector, an oversampled plurality of anomalous datapoints from the anomalous datapoints of the plurality of anomaly detectors; and
training, based on the oversampled plurality of anomalous datapoints, a classifier;
wherein the method is performed by one or more computers.
|