US 11,915,153 B2
Workload-oriented prediction of response times of storage systems
Paulo Abelha Ferreira, Rio de Janeiro (BR); Adriana Bechara Prado, Niterói (BR); and Pablo Nascimento da Silva, Niterói (BR)
Assigned to Dell Products, L.P., Hopkinton, MA (US)
Filed by EMC IP HOLDING COMPANY LLC, Hopkinton, MA (US)
Filed on May 4, 2020, as Appl. No. 16/865,465.
Prior Publication US 2021/0342712 A1, Nov. 4, 2021
Int. Cl. G06N 5/04 (2023.01); G06F 16/11 (2019.01); G06N 20/20 (2019.01); G06N 5/01 (2023.01)
CPC G06N 5/04 (2013.01) [G06F 16/11 (2019.01); G06N 5/01 (2023.01); G06N 20/20 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A non-transitory tangible computer readable storage medium having stored thereon a computer program for implementing a workload-oriented prediction of storage system response time, the computer program including a set of instructions which, when executed by a computer, cause the computer to perform a method comprising the steps of:
obtaining a set of training examples, the set of training examples including a plurality of training examples obtained from a plurality of storage systems, each training example being obtained from a respective one of the plurality of storage systems, and including physical configuration information of the storage system describing a number of storage engines and a number of back-end drive arrays of the respective one of the plurality of storage systems, workload features characterizing a workload processed by the respective one of the plurality of storage systems during a time interval, and storage system response time of the respective one of the plurality of storage systems when processing workload characterized by the workload features during the time interval;
clustering the set of training examples into K clusters according to the workload features, each cluster including a subset of the training examples, wherein K is an integer greater than 1 (K≥2); and
using each subset of training examples to train a respective supervised learning process for the cluster, to cause each supervised learning process to learn a respective regression between two independent variables, the number of storage engines and number of back-end drive arrays of the storage system and the workload features, and a dependent variable, the storage system response time;
wherein each of the supervised learning processes is a decision tree supervised learning process, each decision tree including a plurality of branches containing nodes and terminating at leaves, the nodes of the decision trees being the number of storage engines and the number of back-end drive arrays of the storage system, and the leaves of the comprise nodes of the decision tree being the learned storage system response times.