| CPC G06F 11/3419 (2013.01) [G06F 9/5055 (2013.01)] | 20 Claims |

|
1. A non-transitory computer-readable storage medium comprising instructions stored thereon, wherein the instructions when executed by at least one data processor of a system, cause the system to:
receive, from a computing device, a set of output generation requests, each comprising a prompt for generation of one or more responses by executing one or more artificial intelligence (AI) models on one or more hardware resources of a set of available hardware resources;
for each output generation request of the set of output generation requests:
using the prompt of the output generation request, generate a set of output attributes of the output generation request,
wherein the generated set of output attributes of the output generation request indicate: (1) a type of the output generated from the prompt and (2) a threshold response time of the generation of the output; and
using the set of output attributes, map the output generation request to a set of requested hardware resources by:
identifying one or more dependencies associated with processing the output generation request using the one or more AI models, and
using the identified dependencies, determining an estimated hardware resource usage associated with processing the output generation request using the one or more AI models;
dynamically partition the set of available hardware resources to determine, for each output generation request, a set of selected hardware resources within the set of available hardware resources using: (1) a compatibility between one or more hardware resources of the set of available hardware resources and the set of requested hardware resources and (2) one or more sets of requested hardware resources of other output generation requests in the set of output generation requests;
provide the prompt of each output generation request to a corresponding set of selected hardware resources to generate a set of outputs by processing the prompt included in the output generation request using the one or more AI models; and
responsive to the generated set of outputs, transmit, to the computing device, the output within the threshold response time.
|