US 11,868,812 B2
Predictive scaling of container orchestration platforms
Laurentiu Gabriel Ghergu, Bucharest (RO); Filis Omer, Constanta (RO); Costel Moraru, Egmating (DE); and Erik Rueger, Ockenheim (DE)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Aug. 12, 2021, as Appl. No. 17/400,385.
Prior Publication US 2023/0050796 A1, Feb. 16, 2023
Int. Cl. G06F 9/50 (2006.01); G06N 3/045 (2023.01)
CPC G06F 9/505 (2013.01) [G06N 3/045 (2023.01); G06F 2209/5019 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
training, by a management node of a container orchestration platform, a plurality of recurrent neural networks (RNNs) registered with the container orchestration platform in parallel for a plurality of time intervals using metric data collected by one or more computational nodes of the container orchestration platform, wherein the metric data is stored to persistent storage of the management node, wherein a winning RNN from the plurality of RNNs has a lowest measured mean absolute error (MAE) among the plurality of RNNs during a current time interval and provides a most accurate workload estimation strategy as measured by the MAEs of the plurality of RNNs, wherein each MAE is the mean of the absolute value of each prediction error of the metric data, and wherein the prediction error is the difference between an actual value for workload of the container orchestration platform and a future predicted value of the workload during the plurality of time intervals by each RNN of the plurality of RNNs;
loading, by the management node, the winning RNN into a controller component of the management node in the next time interval;
in accordance with the most accurate workload estimation strategy of the winning RNN in the next time interval:
dynamically creating, by the management node, one or more new computational nodes on the container orchestration platform, or
dynamically removing, by the management node, one or more preexisting nodes from the container orchestration platform; and
training, by the management node, a new RNN registered with the container orchestration platform using the metric data in the next time interval, wherein the new RNN is included in the plurality of RNNs based on a verification that a workload estimation strategy of the new RNN predicts workload of the container orchestration platform within an acceptable MAE above a set threshold.