| CPC G06F 9/45504 (2013.01) [G06F 9/45558 (2013.01); G06F 9/5083 (2013.01); G06F 40/30 (2020.01); G06F 2009/45562 (2013.01)] | 20 Claims |

|
1. A system for orchestrating multiple artificial intelligence (AI) agents, comprising:
a unified user interface for receiving a user request in one of a plurality of formats;
a Large Language Model (LLM) configured to extract user intent from the received user request;
a domain-specific database constructed based on domain-specific materials;
a plurality of AI agents instantiated as containerized instances within a cloud computing environment, wherein:
a first subset of the plurality of AI agents is maintained in an active state, each provisioned with processor cycles, memory, and network bandwidth to handle real-time user requests; and
a second subset of the plurality of AI agents is maintained in an inactive state, wherein execution containers or virtual machines for the second subset of the plurality of AI agents require initialization before invocation;
a model orchestration subsystem implemented as software instructions executable by one or more processors, the model orchestration subsystem configured to:
determine, based on the extracted user intent, whether the received user request needs to be processed locally using the domain-specific database or by invoking one or more of the plurality of AI agents;
in response to the received user request being processed by invoking one or more of the plurality of AI agents
identify one or more candidate AI agents from the plurality of AI agents based on the extracted user intent;
retrieve model metrics of the one or more candidate AI agents, wherein the retrieved model metrics of candidate AI agents in the active state include real-time model metrics, the retrieved model metrics of candidate AI agents in the inactive state include historical model metrics;
identify one or more target AI agents from the one or more candidate AI agents based on the retrieved model metrics;
for a first target AI agent of the one or more target AI agents, determine whether the first target AI agent is in an active or inactive state;
in response to the first target AI agent being in an inactive state, initiate a spin-up process comprising:
provisioning the processor cycles, memory, and network bandwidth for the first target AI agent using a cloud-based resource manager of the cloud computing environment;
instantiating a containerized execution environment associated with the first target AI agent using the provisioned processor cycles, memory, and network bandwidth for the first target AI agent;
loading a pre-trained model of the first target AI agent into the instantiated containerized execution environment;
monitoring an initialization state of the first target AI agent using system telemetry data until the first target AI agent reaches a ready state;
in response to the first target AI agent being in an active state or upon successful initialization of an inactive target AI agent, construct and send one or more prompts to the one or more target AI agents;
generate a response based on returned data from the one or more target AI agents;
return the generated response through the unified user interface;
for a second target AI agent of the identified one or more target AI agents that is in active state, determining whether the real-time metrics of the second target AI agent indicate a degrading trend toward a predefined threshold;
preemptively spin-up another instance of the second target AI agent in the cloud computing environment by provisioning the processor cycles, memory, and network bandwidth for the second target AI agent and instantiating a containerized execution environment associated with the second target AI agent using the provisioned processor cycles, memory, and network bandwidth for the second target AI agent; and
execute automatic load-balancing among all active instances of the second target AI agent.
|