CPC G06N 7/01 (2023.01) [G05B 23/0283 (2013.01); G06F 16/2365 (2019.01)] | 9 Claims |
1. A processor-implemented method for predictive maintenance of a High-Performance Computing (HPC) system, comprising:
monitoring a plurality of nodes of the HPC system, via one or more hardware processors;
collecting performance data from each of the plurality of nodes, via the one or more hardware processors;
performing abstraction and cleansing of the collected performance data, the abstraction and cleansing comprising:
sampling the collected performance data from each of the plurality of nodes, to generate a plurality of data samples;
removing outliers from the plurality of data samples, to generate pre-processed data;
expanding the pre-processed data by applying a time-series up-sampling process;
calculating a probability distribution function based on one or more statistical properties of the collected performance data after applying the time-series up-sampling process;
iteratively performing till a measured divergence of the collected performance data is below a threshold of divergence:
generating random samples of pre-defined length based on the calculated probability distribution;
measuring the divergence between the collected performance data and a corresponding original data, by processing the generated random samples;
comparing the measured divergence with the threshold of divergence; and
adding an additional generated data to the collected performance data to reduce the measured divergence, if the measured divergence exceeds the threshold of divergence;
converting the collected performance data to a time-series data, after reducing the measured divergence below the threshold of divergence;
associating a trend information with the time-series data;
performing data padding to generate padded time series data, wherein the data padding comprises filling up empty spaces present in the time series data; and
smoothening the padded time-series data by applying a mean aggregating timeseries downsampling, to generate a cleansed performance data; and
generating one or more predictions with respect to maintenance of at least one of the plurality of nodes of the HPC, by applying a machine learning (ML) prediction on the cleansed performance data.
|