US 12,067,366 B1
Generative text model query system
Jake Heller, San Mateo, CA (US); Pablo Arredondo, Palo Alto, CA (US); Walter DeFoor, Rockville, MD (US); Ryan Walker, Lancaster, PA (US); and Javed Qadrud-Din, Union City, CA (US)
Assigned to Casetext, Inc., San Francisco, CA (US)
Filed by Casetext, Inc., San Francisco, CA (US)
Filed on Feb. 15, 2023, as Appl. No. 18/169,701.
Int. Cl. G06F 40/35 (2020.01)
CPC G06F 40/35 (2020.01) 19 Claims
OG exemplary drawing
 
1. A method comprising:
training a machine learning model to identify text portions from one or more input documents;
determining a plurality of text portions based on the one or more input documents using the machine learning model, the text portions including a respective number of words below a designated chunk threshold;
determining a first plurality of text generation prompts based on a text generation prompt template, each of the first plurality of text generation prompts including: (1) a respective text portion of the plurality of text portions, (2) a plurality of natural language questions related to the respective text portion, and (3) a first natural language instruction to answer the plurality of natural language questions based on the text portion, the first natural language instruction being included in the text generation prompt template;
transmitting one or more first text generation prompt messages including the plurality of text generation prompts to a remote text generation modeling system via a communication interface;
receiving a first plurality of text generation prompt response messages from the remote text generation modeling system via the communication interface, the first plurality of text generation prompt response messages including first respective novel text portions generated by a text generation model implemented at the remote text generation modeling system;
identifying one or more factual assertions in the respective novel text portions generated by the text generation model;
determining one or more search terms associated with the one or more factual assertions;
executing a search query to identify one or more search results based on the one or more search terms;
evaluating the one or more factual assertions against the one or more search results;
identifying at least one factual assertion of the one or more factual assertions as a hallucination generated by the text generation model;
determining one or more second text generation prompts based on the hallucination including a second language instruction to correct the at least one factual assertion identified as the hallucination;
transmitting one or more second text generation prompt messages including the one or more second text generation prompts;
receiving one or more second text generation prompt response messages from the remote text generation modeling system via the communication interface, the one or more second text generation prompt response messages including second respective novel text portions generated by the text generation model including a correction to the at least one factual assertion;
parsing the first plurality of text generation prompt response messages and the one or more second text generation prompt response messages via a processor to generate a plurality of answers corresponding with the plurality of natural language questions, wherein generating the plurality of answers involves determining a text consolidation prompt based on the plurality of text generation prompt response messages and a text consolidation prompt template, the text consolidation prompt including a third natural language instruction to consolidate some or all of the novel text portions; and
transmitting an output message to a client machine via the communication interface, the output message being determined based on the plurality of answers.