CPC G06F 11/3476 (2013.01) [G06F 11/079 (2013.01); G06F 11/3409 (2013.01); G06N 7/01 (2023.01); G06N 7/02 (2013.01); G06F 11/0793 (2013.01); G06F 11/3447 (2013.01)] | 24 Claims |
1. A non-transitory machine-readable medium storing one or more sequences of instructions for proactively avoiding performance issues in computing environments, wherein execution of said one or more instructions by one or more processors contained in a digital processing system cause said digital processing system to perform the actions of:
forming in a memory of said digital processing system, a causal dependency graph representing the usage dependencies among a plurality of components deployed on a plurality of nodes in a computing environment during processing of prior user requests received on a network, wherein a first component has a usage dependency on a second component and is represented connected by an edge of said causal dependency graph if said first component is designed to invoke said second component during processing of said prior requests, wherein said first component and said second component are contained in said plurality of components,
wherein each component of said plurality of components is associated with a corresponding set of key performance indicators (KPIs), wherein each KPI measures usage of a corresponding resource in the component during processing of prior user requests;
training a probabilistic model with prior incidents that have occurred in said plurality of components, wherein said probabilistic model correlates outliers of one or more KPIs in associated components of said plurality of components to prior incidents,
wherein an outlier for a KPI represents a value outside of an acceptable range of values for the KPI,
wherein a prior incident represents a failure or defect of a component of said plurality of components, which required a corresponding corrective action to be performed,
wherein said training comprises determining said correlation based on said causal dependency graph;
detecting, by an outlier detector module, the occurrence of a set of outliers for a first set of KPIs during the processing of user requests, said first set of KPIs being contained in said plurality of KPIs;
identifying by a predictor module, an imminent performance issue likely to occur in a first component of said plurality of components based on said probabilistic model and said set of outliers, said first component being deployed in a first node of said plurality of nodes; and
sending to said first node, by a preventive action generator module, a command, which upon performance on said first node causes change in allocation of one or more resources in said first node to avoid the occurrence of said imminent performance issue in said first component deployed on said first node.
|