CPC G06F 16/3329 (2019.01) [G06F 40/284 (2020.01)] | 28 Claims |
1. A method of answering queries using one or more families of large language models (h-LLMs) by a computer comprising a processor, a non-transitory storage medium, and software on the storage medium, the method comprising:
receiving a user prompt at a user interface;
generating a plurality of derived prompts from the user prompt at an input broker;
generating a plurality of prompt embeddings from the plurality of derived prompts by applying a plurality of embedding models;
transmitting the plurality of prompt embeddings to a vector database, the vector database comprising a database of knowledge documents, each knowledge document comprised by the database of knowledge documents having one or more embeddings associated therewith;
receiving one or more knowledge documents that are determined to be relevant to the plurality of prompt embeddings at the input broker;
generating a plurality of context-aware prompts by the input broker responsive to the user prompt, the plurality of derived prompts, and the one or more knowledge documents;
transmitting the plurality of context-aware prompts to the one or more h-LLMs;
receiving a plurality of h-LLM results at an output broker, the h-LLM results being generated responsive to the one or more h-LLMs receiving at least one context-aware prompt and generating a response thereto;
processing the plurality of h-LLM results by the output broker to produce processed h-LLM results, each processed h-LLM result having a score;
identifying one or more preferred results responsive to the scores; and
transmitting the one or more preferred results to a user via the user interface.
|