US 12,405,977 B1
	Method and system for optimizing use of retrieval augmented generation pipelines in generative artificial intelligence applications
Vijay Madisetti, Alpharetta, GA (US); and Arshdeep Bahga, Chandigarh (IN)
Assigned to Vijay Madisetti, Alpharetta, GA (US)
Filed by Vijay Madisetti, Alpharetta, GA (US)
Filed on Aug. 22, 2024, as Appl. No. 18/812,707.
Application 18/812,707 is a continuation in part of application No. 18/744,199, filed on Jun. 14, 2024, granted, now 12,306,859.
Application 18/744,199 is a continuation in part of application No. 18/406,906, filed on Jan. 8, 2024, granted, now 12,158,904, issued on Dec. 3, 2024.
Application 18/406,906 is a continuation in part of application No. 18/470,487, filed on Sep. 20, 2023, granted, now 12,147,461, issued on Nov. 19, 2024.
Claims priority of provisional application 63/551,548, filed on Feb. 9, 2024.
Claims priority of provisional application 63/604,909, filed on Dec. 1, 2023.
Claims priority of provisional application 63/604,910, filed on Dec. 1, 2023.
Claims priority of provisional application 63/602,675, filed on Nov. 27, 2023.
Int. Cl. G06F 16/3329 (2025.01); G06F 40/284 (2020.01)

CPC G06F 16/3329 (2019.01) [G06F 40/284 (2020.01)]

21 Claims

1. A method of improving performance of large language models (LLMs) comprising:

receiving one or more context files via an application programming interface (API) at an input broker from a user interface;

generating one or more refined context files from the one or more context files using one or more refining LLMs;

sending the one or more refined context files to one or more h-LLMs via a cloud service API, the one or more h-LLMs being hosted in a cloud container environment;

receiving a user prompt via the API at the input broker from the user interface;

generating a plurality of derived prompts from the user prompt at the input broker;

transmitting the plurality of derived prompts to the one or more h-LLMs via the cloud service API;

receiving a plurality of h-LLM results at an output broker, the plurality of h-LLM results being generated responsive to both the one or more refined context files and the plurality of derived prompts;

processing the plurality of h-LLM results at the output broker to generate a responsive result; and

transmitting the responsive result to the user interface via the API.