US 12,423,596 B2
Throughput based sizing for hive deployment
Anand Ayyappan, Mumbai (IN); Shapur Wadia, Mumbai (IN); Shalaka Verma, Mumbai (IN); Seema Nagar, Bangalore (IN); Kuntal Dey, New Delhi (IN); Sanjeev Kumar, Pune (IN); and Jayrama Sarma Praturi, Andhra Pradesh (IN)
Assigned to Kyndryl, Inc., New York, NY (US)
Filed by Kyndryl, Inc., New York, NY (US)
Filed on Jul. 1, 2020, as Appl. No. 16/918,540.
Prior Publication US 2022/0004895 A1, Jan. 6, 2022
Int. Cl. G06N 5/04 (2023.01); G06F 11/30 (2006.01); G06F 11/34 (2006.01); G06N 20/00 (2019.01)
CPC G06N 5/04 (2013.01) [G06F 11/302 (2013.01); G06F 11/3495 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
measuring a data performance measurement of a shared-nothing architecture of a Hadoop and Hive implementation of a computer system, the data performance measurement including a number of queries performed on data stored in the computer system over a time series, the data performance measurement dividing a total number of queries according to data characteristics including queries to only partitioned data, only bucketed data, both partitioned and bucketed data, and neither partitioned nor bucketed data;
forecasting, by executing a forecasting model, a future value of the data performance measurement on the time series of the computer system;
configuring a set of throughput model input parameters;
computing, by executing a throughput model using the set of throughput model input parameters and the future value of the data performance measurement of the computer system, a throughput requirement for the computer system being sized by computing without utilizing a bloom filter based on adding together throughputs for the queries with the data characteristics, wherein the throughputs are determined by a number of queries for the data characteristics, a size of a Hive dataset of storage devices in a cluster, and a compression percentage;
determining a capacity requirement corresponding to the throughput requirement; and
deploying, according to the capacity requirement, a resource within the computer system.