US 12,321,370 B2
	Method and system for multi-level artificial intelligence supercomputer design featuring sequencing of large language models
Vijay Madisetti, Alpharetta, GA (US); and Arshdeep Bahga, Chandigarh (IN)
Assigned to Vijay Madisetti, Alpharetta, GA (US)
Filed by Vijay Madisetti, Alpharetta, GA (US)
Filed on Aug. 12, 2024, as Appl. No. 18/801,421.
Application 18/801,421 is a continuation of application No. 18/470,487, filed on Sep. 20, 2023, granted, now 12,147,461.
Application 18/470,487 is a continuation of application No. 18/348,692, filed on Jul. 7, 2023, granted, now 12,001,462.
Claims priority of provisional application 63/469,571, filed on May 30, 2023.
Claims priority of provisional application 63/463,913, filed on May 4, 2023.
Prior Publication US 2024/0403338 A1, Dec. 5, 2024
Int. Cl. G06F 16/3329 (2025.01); G06F 40/284 (2020.01)

CPC G06F 16/3329 (2019.01) [G06F 40/284 (2020.01)]

30 Claims

1. A method for assigning tasks to LLMs using one or more families of large language models (h-LLMs) by a computer comprising of a processor, non-transitory storage medium, and software on the storage medium, the method comprising:

receiving a received prompt at a user interface via an API;

generating a plurality of derived prompts from the received prompt at an input broker based on one or more specialized tasks, each derived prompt of the plurality of derived prompts corresponding to at least one specialized task of the one or more specialized tasks and belonging to one or more categories;

generating a plurality of prompt embeddings from the plurality of derived prompts by applying a plurality of embedding models;

transmitting the plurality of prompt embeddings to a vector database, the vector database comprising a database of knowledge documents, each knowledge document comprised by the database of knowledge documents having one or more embeddings associated therewith;

receiving one or more received knowledge documents from the vector database determined to be relevant to the plurality of prompt embeddings at the input broker;

generating a plurality of context-aware prompts by the input broker responsive to at least one of the received prompt, the plurality of derived prompts, and the one or more received knowledge documents, each context-aware prompt of the plurality of context-aware prompts corresponding to a specialized task of the one or more specialized tasks;

transmitting each context-aware prompt of the plurality of context-aware prompts to a respective h-LLM of a plurality of h-LLMs that is configured to specialize in processing prompts having a specialty corresponding to the specialized task of the context-aware prompt; and

receiving a plurality of produced results from at least one of the plurality of h-LLMs by an output broker;

wherein the plurality of h-LLMs operate as a network of communicating h-LLMs;

wherein at least one of the received prompt or the plurality of produced results is processed and transmitted to another h-LLM of the plurality of h-LLMs in the network

wherein at least one of the input broker and the output broker is configured to coordinate the plurality of context-aware prompts to be processed by a sequence of h-LLMs of the plurality of h-LLMs in response to at least one of the received prompt or the derived prompts.