CPC G06N 3/0455 (2023.01) [G06N 3/084 (2013.01)] | 20 Claims |
1. A non-transitory computer-readable storage medium comprising instructions thereon, wherein the instructions when executed by at least one data processor of a system, cause the system to:
receive, from a user device, an output generation request comprising a prompt for generation of a text-based output using a first large-language model (LLM) of a plurality of LLMs;
determine a performance metric associated with processing the output generation request;
determine a system state associated with system resources for processing requests using the first LLM of the plurality of LLMs;
calculate, based on the system state, a threshold metric value for the determined performance metric;
determine a first estimated performance metric value for the determined performance metric based on an indication of an estimated resource usage by the first LLM when processing the prompt included in the output generation request;
compare the first estimated performance metric value with the threshold metric value;
in response to determining that the first estimated performance metric value satisfies the threshold metric value:
provide the prompt to the first LLM to generate a first output by processing the prompt included in the output generation request; and
transmit the first output to a computing system enabling access to the first output by the user device;
in response to determining that the first estimated performance metric value does not satisfy the threshold metric value:
determine a second estimated performance metric value for the determined performance metric based on an indication of an estimated resource usage by a second LLM of the plurality of LLMs when processing the prompt included in the output generation request;
compare the second estimated performance metric value with the threshold metric value; and
in response to determining that the second estimated performance metric value satisfies the threshold metric value:
provide the prompt to the second LLM to generate a second output by processing the prompt included in the output generation request; and
transmit the second output to the computing system enabling access to the second output by the user device.
|