CPC G06F 21/552 (2013.01) [G06F 2221/034 (2013.01)] | 21 Claims |
1. A method comprising:
receiving, by a computer system, a first plurality of process data records from a computing cluster running a distributed application, the distributed application being composed of a workload comprising one or more pods, the first plurality of process data records including first information regarding software processes running within the one or more pods over a first time period;
building, by the computer system, a reference model based on the first plurality of process data records, the reference model capturing normal process behavior of the workloads and the one or more pods;
receiving, by the computer system, a second plurality of process data records from the computing cluster, the second plurality of process data records including second information regarding software processes running within the one or more pods over a second time period;
comparing, by the computer system, the second plurality of process data records to the reference model;
upon detecting a deviation between the second plurality of process data records and the reference model, generating, by the computer system, a record indicating an anomaly in the workload or the one or more pods during the second time period; and
adjusting the resource allocation or restarting the one or more pods exhibiting the anomaly to restore normal operation of the distributed application.
|