| CPC G06F 11/079 (2013.01) [G06F 11/0709 (2013.01); G06F 11/076 (2013.01); G06F 11/3447 (2013.01)] | 17 Claims |

|
1. A computer-implemented method for detecting trigger points to identify root cause failure and fault events in a microservice system, the method comprising:
collecting, by a monitoring agent, entity metrics data and system key performance indicator (KPI) data from the microservice system;
integrating the entity metrics data and the KPI data;
constructing an initial system state space;
detecting system state changes by calculating a distance between current batch data and an initial state;
dividing a status of the microservice system into different states;
learning a causal graph over the system state changes to detect pods and nodes most likely to cause the root cause failure and fault events using a disentangle graph learning-based incremental discovery framework; and
mitigating damage to the microservice system from the failure and fault events using the detected pods and nodes.
|