US 12,455,910 B1
	Controlled probabilistic sentence space expansion
Samuel Atkins, Vancouver (CA); Giacomo Domeniconi, Miami, FL (US); and Ali Fathi, San Francisco, CA (US)
Assigned to U.S. BANCORP, NATIONAL ASSOCIATION, Minneapolis, MN (US)
Filed by U.S. Bancorp, National Association, Minneapolis, MN (US)
Filed on May 19, 2025, as Appl. No. 19/212,571.
Int. Cl. G06F 16/3332 (2025.01); G06F 16/334 (2025.01)

CPC G06F 16/3332 (2019.01) [G06F 16/334 (2019.01)]

20 Claims

1. A system for reducing hallucinations in a large language model using probabilistic sentence space expansion, the system comprising:

one or more processors configured by computer-readable media to:

identify a plurality of input-output pairs, each input-output pair assigned to a probability and comprising an example input text string and a corresponding output text string, the example input text string and the corresponding output text of an input-output pair each comprising a matching variable;

sample a set of input-output pairs from the plurality of input-output pairs based on the probabilities assigned to the plurality of input-output pairs;

generate a list of aggregated input-output pairs from the set of input-output pairs by:

concatenating each of the example input text strings of the set of input-output pairs in a plurality of orders; and

concatenating each of the corresponding output text strings in orders corresponding to the orders of the concatenated example input text strings;

for each aggregated input-output pair of the list of aggregated input-output pairs, generate one or more queries by:

identifying each variable included in the aggregated input-output pair;

retrieving one or more values for each identified variable from a database; and

iteratively replacing each identified variable within the aggregated input-output pair with a different value of the retrieved one or more values for the identified variable to generate a different query of the one or more queries; and

train, using the one or more queries for each aggregated input-output pair, a first large language model to convert input queries to machine-readable prompts configured for input into a second large language model and input the machine-readable prompts into the second large language model.