US 12,405,978 B2
	Method and system for optimizing use of retrieval augmented generation pipelines in generative artificial intelligence applications
Vijay Madisetti, Alpharetta, GA (US); and Arshdeep Bahga, Chandigarh (IN)
Assigned to Vijay Madisetti, Alpharetta, GA (US)
Filed by Vijay Madisetti, Alpharetta, GA (US)
Filed on Feb. 12, 2025, as Appl. No. 19/051,820.
Application 19/051,820 is a continuation in part of application No. 19/040,471, filed on Jan. 29, 2025.
Application 19/040,471 is a continuation in part of application No. 18/921,852, filed on Oct. 21, 2024.
Application 18/921,852 is a continuation in part of application No. 18/812,707, filed on Aug. 22, 2024.
Application 18/812,707 is a continuation in part of application No. 18/470,487, filed on Sep. 20, 2023, granted, now 12,147,461, issued on Nov. 19, 2024.
Application 18/470,487 is a continuation of application No. 18/348,692, filed on Jul. 7, 2023, granted, now 12,001,462, issued on Jun. 4, 2024.
Claims priority of provisional application 63/742,792, filed on Jan. 7, 2025.
Claims priority of provisional application 63/693,351, filed on Sep. 11, 2024.
Claims priority of provisional application 63/647,092, filed on May 14, 2024.
Claims priority of provisional application 63/607,647, filed on Dec. 8, 2023.
Claims priority of provisional application 63/607,112, filed on Dec. 7, 2023.
Claims priority of provisional application 63/535,118, filed on Aug. 29, 2023.
Claims priority of provisional application 63/534,974, filed on Aug. 28, 2023.
Claims priority of provisional application 63/529,177, filed on Jul. 27, 2023.
Claims priority of provisional application 63/469,571, filed on May 30, 2023.
Claims priority of provisional application 63/463,913, filed on May 4, 2023.
Prior Publication US 2025/0190460 A1, Jun. 12, 2025
Int. Cl. G06F 16/3329 (2025.01); G06F 40/284 (2020.01)

CPC G06F 16/3329 (2019.01) [G06F 40/284 (2020.01)]

24 Claims

1. A method of generating outputs in large language models (LLMs) comprising:

receiving one or more documents comprising textual content;

defining one or more contexts for the one or more documents comprising:

identifying at least one of a topic or a category associated with the textual content;

segmenting the textual content into one or more content chunks, each content chunk being associated with the at least one of topic or category;

assigning at least one tag to each content chunk of the one or more content chunks;

identifying one or more selected chunks from the one or more content chunks;

adding metadata to the one or more selected chunks; and

indexing the one or more selected chunks into an index;

receiving a query related to the one or more documents from a user; and

performing a response generation process comprising:

determining if a cache comprises information related to the query;

responsive to determining the cache comprises information related to the query, retrieving the information from the cache; and

responsive to determining the cache does not comprise the information, performing a search of the index to retrieve the information;

generating an augmented query by augmenting the query with information retrieved from at least one of the cache or the search; and

generating a response based on the augmented query;

evaluating the response for compliance with one or more criteria;

responsive to determining the response complies with the one or more criteria:

generating a final response; and

transmitting the final response to the user; and

responsive to determining the response does not comply with at least one criterion of the one or more criteria, performing a fine-tuning process comprising redefining a context of the one or more contexts.