US 12,231,491 B1
Efficient serverless method and system of serving artificial intelligence models
Bo Song, Xian (CN); Jun Wang, Xian (CN); Dong Hai Yu, Xian (CN); Yao Dong Liu, Xian (CN); Xiao Ming Ma, Xian (CN); and Jiang Bo Kang, Xian (CN)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Nov. 14, 2023, as Appl. No. 18/509,247.
Int. Cl. H04L 67/1008 (2022.01); H04L 41/16 (2022.01); H04L 67/1012 (2022.01)
CPC H04L 67/1008 (2013.01) [H04L 41/16 (2013.01); H04L 67/1012 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer program product for forecasting server demand, the computer program product comprising:
one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising code having computer readable instructions that, when executed, causes a computer device to carry out a method of forecasting server demand, the method comprising:
collecting, by a computer processor, a historical number of scoring requests for artificial intelligence (AI) models from a network using a serverless architecture;
determining, by the computer processor, a scoring request for AI models capacity per server using the historical number of scoring requests for AI models;
generating, by the computer processor, a prediction model, wherein the prediction model predicts a first future value of scoring requests for AI models for a first future time span;
determining, by the computer processor, a current number of servers in a pool of servers handling the scoring requests for AI models from the network using the serverless architecture;
determining, by the computer processor and using the prediction model, whether the current number of servers is capable of handling the first future value of scoring requests for AI models for the first future time span;
upon determining that the current number of servers is incapable of handling the first future value of scoring requests for AI models:
warming up, by the computer processor, one or more additional servers; and
adding, by the computer processor, the warmed-up additional servers to the pool of servers prior to an arrival of the first future time span.