US 12,111,926 B1
	Generative artificial intelligence model output obfuscation
David Beveridge, Vancouver, WA (US); Tanner Burns, Austin, TX (US); Kwesi Cappel, Austin, TX (US); and Kenneth Yeung, Ottawa (CA)
Assigned to HiddenLayer, Inc., Austin, TX (US)
Filed by HiddenLayer, Inc., Austin, TX (US)
Filed on May 20, 2024, as Appl. No. 18/669,379.
Int. Cl. G06F 21/55 (2013.01)

CPC G06F 21/554 (2013.01) [G06F 2221/033 (2013.01)]

22 Claims

1. A computer-implemented method comprising:

receiving, from each of a plurality of requesters, data characterizing a corresponding prompt for ingestion by a first generative artificial intelligence (GenAI) model;

determining, for each prompt, that the prompt comprises malicious content or elicits undesired model behavior; and

initiating, in response to the determining, at least one remediation action;

wherein, for a first subset of the prompts, the at least one remediation action comprises:

inputting at least a portion of the received data into the first GenAI model to obtain a first output;

inputting at least a portion of the first output along with obfuscation instructions into a second, different GenAI model to obtain a second output; and

returning data characterizing the second output to the requester;

wherein, for a second subset of the prompts, the at least one remediation action comprises:

blocking an Internet Protocol (IP) and/or medium access control address (MAC) address of a corresponding requester from accessing the first GenAI model.