US 12,450,100 B2
AI model based deployment of an AI model
Aladin Djuhera, Dachau (DE); Alecio Pedro Delazari Binotto, Munich (DE); Fernando Luiz Koch, Palm Beach Gardens, FL (US); and Rob High, Round Rock, TX (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Dec. 14, 2023, as Appl. No. 18/539,572.
Claims priority of application No. 2315267 (GB), filed on Oct. 5, 2023.
Prior Publication US 2025/0117262 A1, Apr. 10, 2025
Int. Cl. G06F 9/50 (2006.01); H04L 67/1004 (2022.01)
CPC G06F 9/5072 (2013.01) [G06F 9/5027 (2013.01); H04L 67/1004 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for executing workloads in a distributed system using a first artificial intelligence (AI) model, comprising:
the distributed system comprising a set of first computer systems which are configured to connect to at least one second computer system of the distributed system,
the first AI model being configured to receive a specific input, process the specific input and provide a specific output, the first AI model being configured to be split in accordance with a split configuration into a set of one or more input blocks, an intermediate block, and a set of one or more output blocks, such that the set of one or more input blocks receive the specific input and provides an intermediate output, the intermediate block receives as input the intermediate output and provides another intermediate output, and the set of one or more output blocks receive as input the other intermediate output and provides said specific output;
the method further comprising:
receiving a request to execute a workload using the first AI model, the workload comprising the specific input;
determining a current resource utilization status in the distributed system;
inputting, to a second AI model, the current resource utilization status and an in-use split configuration of the first AI model, the second AI model being configured for predicting a split configuration for the first AI model;
receiving an output from the second AI model, the output indicating a current split configuration for the first AI model;
splitting the first AI model using the current split configuration;
deploying the split first AI model such that the at least one input block and the at least one output block is executed on the set of first computer systems and such that the intermediate block is executed on the at least one second computer system; and
executing the workload.