| CPC G06F 16/3329 (2019.01) [G06F 40/284 (2020.01)] | 21 Claims |

|
1. A method of improving performance of large language models (LLMs) comprising:
receiving one or more context files via an application programming interface (API) at an input broker from a user interface;
generating one or more refined context files from the one or more context files using one or more refining LLMs;
sending the one or more refined context files to one or more h-LLMs via a cloud service API, the one or more h-LLMs being hosted in a cloud container environment;
receiving a user prompt via the API at the input broker from the user interface;
generating a plurality of derived prompts from the user prompt at the input broker;
transmitting the plurality of derived prompts to the one or more h-LLMs via the cloud service API;
receiving a plurality of h-LLM results at an output broker, the plurality of h-LLM results being generated responsive to both the one or more refined context files and the plurality of derived prompts;
processing the plurality of h-LLM results at the output broker to generate a responsive result; and
transmitting the responsive result to the user interface via the API.
|