| CPC G06F 16/3329 (2019.01) [G06F 40/284 (2020.01)] | 24 Claims |

|
1. A method of generating outputs in large language models (LLMs) comprising:
receiving one or more documents comprising textual content;
defining one or more contexts for the one or more documents comprising:
identifying at least one of a topic or a category associated with the textual content;
segmenting the textual content into one or more content chunks, each content chunk being associated with the at least one of topic or category;
assigning at least one tag to each content chunk of the one or more content chunks;
identifying one or more selected chunks from the one or more content chunks;
adding metadata to the one or more selected chunks; and
indexing the one or more selected chunks into an index;
receiving a query related to the one or more documents from a user; and
performing a response generation process comprising:
determining if a cache comprises information related to the query;
responsive to determining the cache comprises information related to the query, retrieving the information from the cache; and
responsive to determining the cache does not comprise the information, performing a search of the index to retrieve the information;
generating an augmented query by augmenting the query with information retrieved from at least one of the cache or the search; and
generating a response based on the augmented query;
evaluating the response for compliance with one or more criteria;
responsive to determining the response complies with the one or more criteria:
generating a final response; and
transmitting the final response to the user; and
responsive to determining the response does not comply with at least one criterion of the one or more criteria, performing a fine-tuning process comprising redefining a context of the one or more contexts.
|