| CPC G06N 5/04 (2013.01) [G06F 11/302 (2013.01); G06F 11/3495 (2013.01); G06N 20/00 (2019.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
measuring a data performance measurement of a shared-nothing architecture of a Hadoop and Hive implementation of a computer system, the data performance measurement including a number of queries performed on data stored in the computer system over a time series, the data performance measurement dividing a total number of queries according to data characteristics including queries to only partitioned data, only bucketed data, both partitioned and bucketed data, and neither partitioned nor bucketed data;
forecasting, by executing a forecasting model, a future value of the data performance measurement on the time series of the computer system;
configuring a set of throughput model input parameters;
computing, by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement of the computer system, a throughput requirement for the computer system being sized by computing without utilizing a bloom filter based on adding together throughputs for the queries with the data characteristics, wherein the throughputs are determined by a number of queries for the data characteristics, a size of a Hive dataset of storage devices in a cluster, and a compression percentage;
determining a capacity requirement corresponding to the throughput requirement; and
deploying, according to the capacity requirement, a resource within the computer system.
|