US 12,436,835 B2
Trigger point detection for online root cause analysis and system fault diagnosis
Zhengzhang Chen, Princeton Junction, NJ (US); Haifeng Chen, West Windsor, NJ (US); Liang Tong, Lawrenceville, NJ (US); and Dongjie Wang, Orlando, FL (US)
Assigned to NEC Corporation, Tokyo (JP)
Filed by NEC Laboratories America, Inc., Princeton, NJ (US)
Filed on Jul. 26, 2023, as Appl. No. 18/359,288.
Claims priority of provisional application 63/442,155, filed on Jan. 31, 2023.
Claims priority of provisional application 63/397,955, filed on Aug. 15, 2022.
Prior Publication US 2024/0054043 A1, Feb. 15, 2024
Int. Cl. G06F 11/07 (2006.01); G06F 11/34 (2006.01)
CPC G06F 11/079 (2013.01) [G06F 11/0709 (2013.01); G06F 11/076 (2013.01); G06F 11/3447 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method for detecting trigger points to identify root cause failure and fault events in a microservice system, the method comprising:
collecting, by a monitoring agent, entity metrics data and system key performance indicator (KPI) data from the microservice system;
integrating the entity metrics data and the KPI data;
constructing an initial system state space;
detecting system state changes by calculating a distance between current batch data and an initial state;
dividing a status of the microservice system into different states;
learning a causal graph over the system state changes to detect pods and nodes most likely to cause the root cause failure and fault events using a disentangle graph learning-based incremental discovery framework; and
mitigating damage to the microservice system from the failure and fault events using the detected pods and nodes.