US 12,367,094 B2
	Self-optimizing context-aware problem identification from information technology incident reports
Mantinder Jit Singh, Vancouver (CA); Somesh Kumar Srivastava, Pune (IN); and Ajoy Kumar, Santa Clara, CA (US)
Assigned to BMC Helix, Inc., Houston, TX (US)
Filed by BMC Helix, Inc., Houston, TX (US)
Filed on Sep. 30, 2021, as Appl. No. 17/449,538.
Prior Publication US 2023/0100716 A1, Mar. 30, 2023
Int. Cl. G06F 11/07 (2006.01); G06F 18/23213 (2023.01); G06N 20/00 (2019.01)

CPC G06F 11/0781 (2013.01) [G06F 11/0709 (2013.01); G06F 11/0769 (2013.01); G06F 11/079 (2013.01); G06F 18/23213 (2023.01); G06N 20/00 (2019.01)]

12 Claims

1. A computer-implemented method for identifying problems from information technology service management (ITSM) incident reports based on textual data contained in the ITSM incident reports, the method comprising:

converting a plurality of ITSM incident reports from textual data to a plurality of vectors using an encoder, the encoder using a word embedding algorithm, wherein the plurality of ITSM incident reports includes at least thousands of ITSM incident reports;

automatically determining, without user input, a base cluster number using both current incident report data and historic incident report data;

selecting parameters using the plurality of ITSM incident reports by ranking and scoring fields from the plurality of ITSM incident reports by performing data analysis using a cardinality and variability of categorical fields and a uniqueness of records and length of text in textual fields, the parameters including the automatically determined base cluster number, a threshold value for determining cluster quality, a number of subclusters, a maximum number of recursive iterations, and a minimum cluster size obtained from the data analysis;

inputting the plurality of vectors and the parameters, including the automatically determined base cluster number, to a machine learning module using an unsupervised machine learning clustering algorithm, the unsupervised machine learning clustering algorithm including a k-means unsupervised machine learning algorithm;

generating and outputting a base group of clusters using the unsupervised machine learning clustering algorithm by aligning the plurality of vectors into n-dimensional space, determining Euclidean distances, and shifting centroids to form the base group of clusters;

computing a cluster quality score for each of the base group of clusters, the cluster quality score based on a ratio of a cluster inertia value to a number of per cluster data points;

recursively splitting each cluster from the base group of clusters with the cluster quality score above the threshold value into new clusters until the cluster quality score for each cluster in the new clusters is below the threshold value;

outputting a final group of clusters based on the number of subclusters, the maximum number of recursive iterations, and the minimum cluster size obtained from the data analysis, wherein each cluster from the final group of clusters represents ITSM incident reports related to a same problem; and

automatically generating a multi-word label for each cluster from the final group of clusters using terms from an incident report closest to a centroid of the cluster for the multi-word label.