US 12,353,469 B1
	Verification and citation for language model outputs
Ladan Mahabadi, Seattle, WA (US); Alexander Illichmann, Seattle, WA (US); Tong Ge, Belleville, NJ (US); Sudhir Hassan Manikya Raju, Bellevue, WA (US); Seema Yadav, Seattle, WA (US); Stebin Kodiamkunnel Sevichan, Everett, WA (US); Michiel David De Pooter, Seattle, WA (US); and Francesco Furno, New York, NY (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 28, 2024, as Appl. No. 18/759,778.
Int. Cl. G06F 16/00 (2019.01); G06F 16/332 (2019.01); G06F 16/334 (2025.01); G06F 16/383 (2019.01); G06Q 50/18 (2012.01)

CPC G06F 16/383 (2019.01) [G06F 16/332 (2019.01); G06F 16/3344 (2019.01); G06Q 50/18 (2013.01)]

20 Claims

1. A system operating on a provider network, wherein the system comprises:

one or more computer processors;

at least one data store having a knowledge base associated with a user stored thereon, wherein the knowledge base comprises a plurality of documents associated with the user;

a large language model executed by the one or more computer processors, wherein the large language model is trained based at least in part on a corpus of data, and wherein the corpus of data does not include any of the plurality of documents associated with the user; and

a querying and output validation service executed by the one or more computer processors, wherein the querying and output validation service is configured to perform operations comprising:

extracting text from the plurality of documents associated with the user;

partitioning the extracted text into a plurality of text chunks;

augmenting the plurality of text chunks with temporal metadata;

indexing the augmented plurality of text chunks in a searchable database;

receiving at least a free-form description of a question from the user;

converting the free-form description of the question to a first query;

comparing an embedding generated based at least in part on the first query to each of a plurality of embeddings generated based on the indexed augmented plurality of text chunks according to a similarity analysis, wherein the embedding and each of the plurality of embeddings is in a common vector space;

selecting at least a subset of the indexed augmented plurality of text chunks based at least in part on the similarity analysis;

generating, using the large language model, a first response to the question based at least in part on the subset of the indexed augmented plurality of text chunks, wherein the first response comprises a first set of data points identified based at least in part on the temporal metadata;

identifying one of the plurality of documents associated with the user including at least one of the first set of data points identified based at least in part on the temporal metadata; and

providing the first response to the question and the one of the plurality of documents associated with the user to the user.