US 12,259,913 B1
	Caching large language model (LLM) responses using hybrid retrieval and reciprocal rank fusion
Brien H Muschett, Palm Beach Gardens, FL (US); and Justin G Odom, Jupiter, FL (US)
Assigned to Inventus Holdings, LLC, Juno Beach, FL (US)
Filed by Inventus Holdings, LLC, Juno Beach, FL (US)
Filed on Feb. 14, 2024, as Appl. No. 18/441,863.
Int. Cl. G06F 16/332 (2019.01); G06F 16/33 (2019.01); G06F 16/3329 (2025.01); G06F 16/334 (2025.01); G06F 16/335 (2019.01); G06F 16/38 (2019.01)

CPC G06F 16/3326 (2019.01) [G06F 16/3329 (2019.01); G06F 16/3347 (2019.01); G06F 16/335 (2019.01); G06F 16/38 (2019.01)]

16 Claims

1. A method for improving computer functionality by retrieving answers to questions from a cache, the method comprising:

using a hardware processor communicatively coupled to memory to perform accessing a question stored in primary storage communicatively coupled to a cache, in a text format;

accessing metadata associated with the question in the text format;

vectorizing the question in the text format into a high dimensional vector using a text embedding algorithm, wherein the high dimensional vector is greater than or equal to 1024 dimensions;

using the high dimensional vector to search a question portion of the cache using a plurality of retriever types to create a hybrid search, in which the hybrid search combines one or more text format queries using the metadata with one or more high dimensional vector queries in a single search request;

performing query filtering with the metadata associated with the question to provide a semantic layer set of semantic answers in a text format with metadata associated with an answer and semantic relevance values;

using the question in the text format to search an answer portion of the cache and performing query filtering with the metadata associated with the question to provide a lexical layer set of lexical answers in the text format with the metadata associated with the answer and lexical relevance values;

using the semantic layer set in order of the semantic relevance values from highest to lowest and the lexical layer set from highest to lowest; and

applying a reciprocal rank fusion algorithm to compute a combined ranking set for the semantic answers in the text format and the lexical answers in the text format to provide an identified answer.