US 12,292,811 B1
	Dynamic system resource-sensitive model software and hardware selection
Sourabh Deb, Tampa, FL (US); Jason Engelbrecht, London (GB); Zheyu Wang, Shanghai (CN); and Haolin Jin, Shanghai (CN)
Assigned to CITIBANK, N.A., , NY (US)
Filed by Citibank, N.A., New York, NY (US)
Filed on Nov. 20, 2024, as Appl. No. 18/954,389.
Application 18/954,389 is a continuation of application No. 18/812,913, filed on Aug. 22, 2024.
Application 18/812,913 is a continuation in part of application No. 18/661,532, filed on May 10, 2024, granted, now 12,111,747.
Application 18/661,532 is a continuation in part of application No. 18/661,519, filed on May 10, 2024, granted, now 12,106,205.
Application 18/661,532 is a continuation in part of application No. 18/633,293, filed on Apr. 11, 2024, granted, now 12,147,513.
Int. Cl. G06F 11/34 (2006.01); G06F 9/50 (2006.01)

CPC G06F 11/3419 (2013.01) [G06F 9/5055 (2013.01)]

20 Claims

1. A non-transitory computer-readable storage medium comprising instructions stored thereon, wherein the instructions when executed by at least one data processor of a system, cause the system to:

receive, from a computing device, a set of output generation requests, each comprising a prompt for generation of one or more responses by executing one or more artificial intelligence (AI) models on one or more hardware resources of a set of available hardware resources;

for each output generation request of the set of output generation requests:

using the prompt of the output generation request, generate a set of output attributes of the output generation request,

wherein the generated set of output attributes of the output generation request indicate: (1) a type of the output generated from the prompt and (2) a threshold response time of the generation of the output; and

using the set of output attributes, map the output generation request to a set of requested hardware resources by:

identifying one or more dependencies associated with processing the output generation request using the one or more AI models, and

using the identified dependencies, determining an estimated hardware resource usage associated with processing the output generation request using the one or more AI models;

dynamically partition the set of available hardware resources to determine, for each output generation request, a set of selected hardware resources within the set of available hardware resources using: (1) a compatibility between one or more hardware resources of the set of available hardware resources and the set of requested hardware resources and (2) one or more sets of requested hardware resources of other output generation requests in the set of output generation requests;

provide the prompt of each output generation request to a corresponding set of selected hardware resources to generate a set of outputs by processing the prompt included in the output generation request using the one or more AI models; and

responsive to the generated set of outputs, transmit, to the computing device, the output within the threshold response time.