US 12,111,926 B1
Generative artificial intelligence model output obfuscation
David Beveridge, Vancouver, WA (US); Tanner Burns, Austin, TX (US); Kwesi Cappel, Austin, TX (US); and Kenneth Yeung, Ottawa (CA)
Assigned to HiddenLayer, Inc., Austin, TX (US)
Filed by HiddenLayer, Inc., Austin, TX (US)
Filed on May 20, 2024, as Appl. No. 18/669,379.
Int. Cl. G06F 21/55 (2013.01)
CPC G06F 21/554 (2013.01) [G06F 2221/033 (2013.01)] 22 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, from each of a plurality of requesters, data characterizing a corresponding prompt for ingestion by a first generative artificial intelligence (GenAI) model;
determining, for each prompt, that the prompt comprises malicious content or elicits undesired model behavior; and
initiating, in response to the determining, at least one remediation action;
wherein, for a first subset of the prompts, the at least one remediation action comprises:
inputting at least a portion of the received data into the first GenAI model to obtain a first output;
inputting at least a portion of the first output along with obfuscation instructions into a second, different GenAI model to obtain a second output; and
returning data characterizing the second output to the requester;
wherein, for a second subset of the prompts, the at least one remediation action comprises:
blocking an Internet Protocol (IP) and/or medium access control address (MAC) address of a corresponding requester from accessing the first GenAI model.