US 12,457,239 B1
Malicious prompt management for large language models
Itsik Yizhak Mantin, Shoham (IL); Ron Bitton, Or-Yehuda (IL); Guy Shtar, Petah Tikva (IL); Yael Mathov Gome, Be-er Sheva (IL); and Henry Venturelli, Los Angeles, CA (US)
Assigned to Intuit Inc., Mountain View, CA (US)
Filed by Intuit Inc., Mountain View, CA (US)
Filed on Apr. 30, 2024, as Appl. No. 18/651,643.
Int. Cl. H04L 9/40 (2022.01)
CPC H04L 63/1441 (2013.01) 17 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, at a server from a first user device, a first user prompt segment to a large language model (LLM);
obtaining a first additional prompt segment from a first prompt data source;
for each prompt segment of a first plurality of prompt segments in the LLM prompt:
obtaining a length value and a class,
validating that the length value satisfies a threshold length value for the class, wherein the first plurality of prompt segments comprises the first additional prompt segment and the first user prompt segment;
making a determination, at least in part based on the length value satisfying the threshold length value for each prompt segment, that the first plurality of prompt segments does not correspond to a prompt injection event;
identifying a first electronic address in the first user prompt segment;
replacing the first electronic address with a first placeholder to generate a first updated prompt segment;
generating a first LLM prompt comprising the first updated prompt segment and the first user prompt segment;
sending, responsive to the determination, the first LLM prompt to the LLM;
receiving a first response to the first LLM prompt from the LLM, the first response comprising the first placeholder;
replacing the first placeholder with the first electronic address to generate a first updated response; and
sending the first updated response to the first user device.