US 11,972,382 B2
Root cause identification and analysis
Hongtan Sun, Armonk, NY (US); Muhammed Fatih Bulut, New York, NY (US); Pritpal S. Arora, Bangalore (IN); Klaus Koenig, Essenheim (DE); Maja Vukovic, New York, NY (US); and Naga A. Ayachitula, Elmsford, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Feb. 22, 2019, as Appl. No. 16/282,565.
Prior Publication US 2020/0272973 A1, Aug. 27, 2020
Int. Cl. G06Q 10/0639 (2023.01); G06N 20/00 (2019.01)
CPC G06Q 10/06393 (2013.01) [G06N 20/00 (2019.01)] 25 Claims
OG exemplary drawing
 
1. A system comprising:
a processing unit operatively coupled to computer memory; and
an evaluator for use in an information technology (IT) infrastructure having a plurality of physical hardware domains, each domain having key performance indicator (KPI) data, the evaluator operatively coupled to and executable by the processing unit to:
dynamically monitor the physical hardware domains of the IT infrastructure, and reflect one or more changes of one or more of the physical hardware domains in one or more machine learning (ML) models;
identify a first KPI related to a technical health issue of one or more of the dynamically monitored physical hardware domains, the first KPI quantifying a performance of a first IT component;
perform a root cause analysis (RCA) for the identified first KPI, including to: identify a first KPI classification associated with the first KPI;
access a knowledge graph (KG) from a knowledge base stored in the computer memory, the KG corresponding to the identified first KPI classification;
leverage the KG to selectively identify the root cause corresponding to the first KPI;
if the root cause is selectively identified, leverage the KG to identify a second KPI associated with classification of the selectively identified root cause of the first KPI classification, and evaluate a strength of a correlation between the first KPI and the second KPI, the second KPI quantifying a performance of a second IT component; and
if the root cause is not selectively identified, leverage at least one of the one or more ML models using artificial intelligence (AI) to carry out a time series analysis of a series of KPIs over time to selectively identify at least one KPI that is proximally related to the first KPI, identify a classification of the proximally related KPI to diagnose the root cause of the first KPI, and provide a strength of the correlation between the first KPI and the proximally related KPI;
the at least one of the one or more ML models configured to dynamically amend the KG to reflect the diagnosed root cause, wherein a first node of the KG represents the first KPI classification, a second node of the KG represents either the second KPI classification or the proximally related KPI classification, and an edge connecting the first node and the second node represents the correlation;
generate a diagnosis of the technical health issue within the IT environment based on the strength of the correlation between the first KPI and either the second KPI or the proximally related KPI; and
utilize the correlation to manage the physical infrastructure, including resolve the technical health issue.