| CPC G06F 11/079 (2013.01) [G05B 23/0254 (2013.01); G06F 9/451 (2018.02); G06N 20/20 (2019.01)] | 20 Claims |

|
1. A system, comprising:
one or more computing devices;
wherein the one or more computing devices include instructions that upon execution on or across the one or more computing devices cause the one or more computing devices to:
identify a plurality of metrics associated with an application obtained from different, respective sources, wherein the plurality of metrics comprise:
one or more resource-specific metrics for one or more computer resources executing the application; and
one or more application-logic based metrics for performance of one or more operations of the application;
determine an anomaly detection plan for the application, wherein the anomaly detection plan indicates:
(a) one or more probabilistic forecasting models, including a first probabilistic forecasting model which generates a predicted probability distribution of future values of one or more time series of the plurality of metrics associated with the application,
(b) one or more prediction lead times for which measured values of the plurality of metrics are to be analyzed with respect to predicted probability distributions of the plurality of metrics,
(c) at least a first mapping between a range subdivision of a predicted probability distribution for a particular metric and an anomaly score contribution computed with respect to the particular metric, and
(d) that an anomaly score for the application is to be based at least in part on group-wise analysis of at least a first group of metrics of the application comprising the one or more resource-specific metrics and the one or more application-logic based metrics;
execute the anomaly detection plan, wherein to execute the anomaly detection plan, instructions cause the one or more computing devices to generate the anomaly score of the application with respect to a set of observed values of the plurality of metrics of the application including the one or more resource-specific metrics and the one or more application logic-based metrics, wherein generation of the anomaly score comprises:
aggregating a plurality of anomaly score contributions, including:
(a) a first anomaly score contribution associated with a divergence of values of the one or more resource-specific metrics and the one or more application-logic based metrics; and
(b) a second anomaly score contribution obtained using the first mapping and a particular prediction lead time of the one or more prediction lead times; and
cause, in response to a determination that the anomaly score of the application exceeds a threshold, one or more anomaly response operations to be initiated.
|