US 12,278,836 B1
	Canonicalization of unicode prompt injections
Kenneth Yeung, Ottawa (CA); and Jason Martin, Beaverton, OR (US)
Assigned to HiddenLayer, Inc., Austin, TX (US)
Filed by HiddenLayer, Inc., Austin, TX (US)
Filed on Nov. 12, 2024, as Appl. No. 18/945,409.
Int. Cl. G06N 3/092 (2023.01); H04L 9/40 (2022.01)

CPC H04L 63/1466 (2013.01)

22 Claims

1. A computer implemented method comprising:

receiving, by a proxy executing in a computing environment of a monitored generative artificial intelligence (GenAI) model, a prompt for the GenAI model comprising unicode;

identifying unicode fonts in the prompt;

translating the unicode fonts in the prompt into a plaintext representation;

identifying unicode characters in the prompt which each have an associated unicode tag;

determining, based on the associated unicode tags, whether at least a portion of the unicode characters are valid;

when at least a portion of the unicode characters are determined to be valid:

converting the unicode characters in the prompt into a plaintext representation; and

passing the prompt with the translated unicode fonts and the converted unicode characters into the GenAI model if a prompt injection classifier determines that the prompt with the translated unicode fonts and the converted unicode characters does not comprise malicious content or elicit malicious actions; and

when at least a portion of the unicode characters are not determined to be valid:

removing the unicode characters from the prompt; and

passing, after the unicode characters are removed, the prompt with the translated unicode fonts into the GenAI model if the prompt injection classifier determines that the prompt with the translated unicode fonts after removal of the unicode characters does not comprise malicious content or elicit malicious actions;

wherein the prompt injection classifier is trained using a corpus of prompts including prompts encapsulating known prompt injections.