| CPC H04L 63/1466 (2013.01) | 22 Claims |

|
1. A computer implemented method comprising:
receiving, by a proxy executing in a computing environment of a monitored generative artificial intelligence (GenAI) model, a prompt for the GenAI model comprising unicode;
identifying unicode fonts in the prompt;
translating the unicode fonts in the prompt into a plaintext representation;
identifying unicode characters in the prompt which each have an associated unicode tag;
determining, based on the associated unicode tags, whether at least a portion of the unicode characters are valid;
when at least a portion of the unicode characters are determined to be valid:
converting the unicode characters in the prompt into a plaintext representation; and
passing the prompt with the translated unicode fonts and the converted unicode characters into the GenAI model if a prompt injection classifier determines that the prompt with the translated unicode fonts and the converted unicode characters does not comprise malicious content or elicit malicious actions; and
when at least a portion of the unicode characters are not determined to be valid:
removing the unicode characters from the prompt; and
passing, after the unicode characters are removed, the prompt with the translated unicode fonts into the GenAI model if the prompt injection classifier determines that the prompt with the translated unicode fonts after removal of the unicode characters does not comprise malicious content or elicit malicious actions;
wherein the prompt injection classifier is trained using a corpus of prompts including prompts encapsulating known prompt injections.
|