US 12,405,979 B2
	Method and system for optimizing use of retrieval augmented generation pipelines in generative artificial intelligence applications
Vijay Madisetti, Alpharetta, GA (US); and Arshdeep Bahga, Chandigarh (IN)
Assigned to Vijay Madisetti, Alpharetta, GA (US)
Filed by Vijay Madisetti, Alpharetta, GA (US)
Filed on Feb. 18, 2025, as Appl. No. 19/056,496.
Application 19/056,496 is a continuation in part of application No. 19/040,471, filed on Jan. 29, 2025.
Application 19/040,471 is a continuation in part of application No. 18/921,852, filed on Oct. 21, 2024.
Application 18/921,852 is a continuation in part of application No. 18/812,707, filed on Aug. 22, 2024.
Application 18/812,707 is a continuation in part of application No. 18/470,487, filed on Sep. 20, 2023, granted, now 12,147,461, issued on Nov. 19, 2024.
Application 18/470,487 is a continuation of application No. 18/348,692, filed on Jul. 7, 2023, granted, now 12,001,462, issued on Jun. 4, 2024.
Claims priority of provisional application 63/742,792, filed on Jan. 7, 2025.
Claims priority of provisional application 63/693,351, filed on Sep. 11, 2024.
Claims priority of provisional application 63/647,092, filed on May 14, 2024.
Claims priority of provisional application 63/607,647, filed on Dec. 8, 2023.
Claims priority of provisional application 63/607,112, filed on Dec. 7, 2023.
Claims priority of provisional application 63/535,118, filed on Aug. 29, 2023.
Claims priority of provisional application 63/534,974, filed on Aug. 28, 2023.
Claims priority of provisional application 63/529,177, filed on Jul. 27, 2023.
Claims priority of provisional application 63/469,571, filed on May 30, 2023.
Claims priority of provisional application 63/463,913, filed on May 4, 2023.
Prior Publication US 2025/0190461 A1, Jun. 12, 2025
Int. Cl. G06F 16/3329 (2025.01); G06F 40/284 (2020.01)

CPC G06F 16/3329 (2019.01) [G06F 40/284 (2020.01)]

30 Claims

1. A method of processing large contexts in a retrieval-augmented generation (RAG) system comprising:

receiving a query from a user;

retrieving a plurality of relevant documents from at least one database based on the query, wherein the relevant documents form a combined context;

partitioning the combined context into a plurality of context partitions;

generating a plurality of intermediate analysis results by:

processing each context partition of the plurality of context partitions using a mapper prompt; and

sending an output of the mapper prompt from each context partition to one or more large language models (LLMs);

generating a final response by processing the plurality of intermediate analysis results using a reducer prompt sent to one or more LLMs; and

transmitting the final response to the user.