US 12,254,279 B1
Dynamic resource allocation of large language model deployments for conversational interface
Seth Cohen, Southwest Ranches, FL (US); Heather Widler, Melbourne, FL (US); John A. Torres, Kissimmee, FL (US); and John D. Doak, Tulsa, OK (US)
Assigned to HONESTY INNOVATIONS HOLDINGS, LLC, Deerfield Beach, FL (US)
Filed by Honesty Innovations Holdings, LLC, Deerfield Beach, FL (US)
Filed on May 23, 2024, as Appl. No. 18/672,689.
Int. Cl. G06F 40/35 (2020.01); G06F 9/445 (2018.01)
CPC G06F 40/35 (2020.01) [G06F 9/44536 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A dynamic conversation interface system, comprising:
computing hardware, including at least one processor, data storage, and input/output facilities;
wherein the computing hardware stores instructions that, when executed by the at least one processor, cause the at least one processor to implement:
a plurality of instances of a large language model (LLM) engine, wherein each LLM engine instance is configured according to a respective set of system directives;
an agent manager engine operative to instantiate and configure the instances of the LLM engine such that a first LLM engine instance is configured according to a first set of system directives, and a second LLM engine instance is configured according to a second set of system directives that is different from the first set,
wherein the first LLM engine instance has a different functional specialization from the second LLM engine instance,
wherein the first LLM engine instance and the second LLM engine instance engage in a same conversation session with a user to perform different specializations within that conversation session;
wherein the first set of system directives include directives that determine occurrence of a defined condition for instantiating the second LLM engine instance, and wherein the agent manager engine is further operative to instantiate the second LLM engine instance in response to the occurrence of the defined condition; and
wherein each LLM engine instance comprises:
a buffer memory that is operative to temporarily store recent history of a current conversation in which the LLM engine instance is engaged; and
a context window that stores a selected subset of information from the buffer memory which represents context of a defined recent portion of the current conversation.