US 12,273,381 B1
Detection of machine learning model attacks obfuscated in unicode
Kenneth Yeung, Ottawa (CA); and Jason Martin, Beaverton, OR (US)
Assigned to HiddenLayer, Inc., Austin, TX (US)
Filed by HiddenLayer, Inc., Austin, TX (US)
Filed on Nov. 12, 2024, as Appl. No. 18/945,370.
Int. Cl. H04L 9/40 (2022.01); G06F 40/242 (2020.01); G06F 40/284 (2020.01)
CPC H04L 63/1466 (2013.01) [G06F 40/242 (2020.01); G06F 40/284 (2020.01)] 23 Claims
OG exemplary drawing
 
1. A computer implemented method comprising:
receiving, by a proxy executing in a computing environment of a monitored generative artificial intelligence (GenAI) model, a prompt for the GenAI model comprising unicode;
tokenizing the prompt to result in a plurality of tokens;
identifying and removing tokens forming part of a repeating sequence to result in a modified set of tokens;
detokenizing the modified set of tokens to result in a modified prompt;
determining whether ingestion of the modified prompt by the GenAI model will result in the GenAI model behaving in an undesired manner;
passing the modified prompt to the GenAI model when it is determined that ingestion of the modified prompt will not result in the GenAI model behaving in an undesired manner; and
initiating at least one remediation action when it is determined that ingestion of the modified prompt by the GenAI model will result in the GenAI model behaving in an undesired manner.