CPC G06N 5/04 (2013.01) [G06F 16/11 (2019.01); G06N 5/01 (2023.01); G06N 20/20 (2019.01)] | 18 Claims |
1. A non-transitory tangible computer readable storage medium having stored thereon a computer program for implementing a workload-oriented prediction of storage system response time, the computer program including a set of instructions which, when executed by a computer, cause the computer to perform a method comprising the steps of:
obtaining a set of training examples, the set of training examples including a plurality of training examples obtained from a plurality of storage systems, each training example being obtained from a respective one of the plurality of storage systems, and including physical configuration information of the storage system describing a number of storage engines and a number of back-end drive arrays of the respective one of the plurality of storage systems, workload features characterizing a workload processed by the respective one of the plurality of storage systems during a time interval, and storage system response time of the respective one of the plurality of storage systems when processing workload characterized by the workload features during the time interval;
clustering the set of training examples into K clusters according to the workload features, each cluster including a subset of the training examples, wherein K is an integer greater than 1 (K≥2); and
using each subset of training examples to train a respective supervised learning process for the cluster, to cause each supervised learning process to learn a respective regression between two independent variables, the number of storage engines and number of back-end drive arrays of the storage system and the workload features, and a dependent variable, the storage system response time;
wherein each of the supervised learning processes is a decision tree supervised learning process, each decision tree including a plurality of branches containing nodes and terminating at leaves, the nodes of the decision trees being the number of storage engines and the number of back-end drive arrays of the storage system, and the leaves of the comprise nodes of the decision tree being the learned storage system response times.
|