US 12,175,389 B2
Method and system for predictive maintenance of high performance systems (HPC)
Rajesh Gopalrao Kulkarni, Pune (IN); Amit Kalele, Pune (IN); Anubhav Jain, Pune (IN); Sanjay Lalwani, Pune (IN); and Pradeep Gameria, Pune (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Sep. 22, 2021, as Appl. No. 17/481,552.
Claims priority of application No. 202121027797 (IN), filed on Jun. 21, 2021.
Prior Publication US 2023/0026064 A1, Jan. 26, 2023
Int. Cl. G05B 23/02 (2006.01); G06F 16/23 (2019.01); G06N 7/01 (2023.01)
CPC G06N 7/01 (2023.01) [G05B 23/0283 (2013.01); G06F 16/2365 (2019.01)] 9 Claims
OG exemplary drawing
 
1. A processor-implemented method for predictive maintenance of a High-Performance Computing (HPC) system, comprising:
monitoring a plurality of nodes of the HPC system, via one or more hardware processors;
collecting performance data from each of the plurality of nodes, via the one or more hardware processors;
performing abstraction and cleansing of the collected performance data, the abstraction and cleansing comprising:
sampling the collected performance data from each of the plurality of nodes, to generate a plurality of data samples;
removing outliers from the plurality of data samples, to generate pre-processed data;
expanding the pre-processed data by applying a time-series up-sampling process;
calculating a probability distribution function based on one or more statistical properties of the collected performance data after applying the time-series up-sampling process;
iteratively performing till a measured divergence of the collected performance data is below a threshold of divergence:
generating random samples of pre-defined length based on the calculated probability distribution;
measuring the divergence between the collected performance data and a corresponding original data, by processing the generated random samples;
comparing the measured divergence with the threshold of divergence; and
adding an additional generated data to the collected performance data to reduce the measured divergence, if the measured divergence exceeds the threshold of divergence;
converting the collected performance data to a time-series data, after reducing the measured divergence below the threshold of divergence;
associating a trend information with the time-series data;
performing data padding to generate padded time series data, wherein the data padding comprises filling up empty spaces present in the time series data; and
smoothening the padded time-series data by applying a mean aggregating timeseries downsampling, to generate a cleansed performance data; and
generating one or more predictions with respect to maintenance of at least one of the plurality of nodes of the HPC, by applying a machine learning (ML) prediction on the cleansed performance data.