CPC G06F 21/554 (2013.01) [G06F 2221/033 (2013.01)] | 22 Claims |
1. A computer-implemented method comprising:
receiving, from each of a plurality of requesters, data characterizing a corresponding prompt for ingestion by a first generative artificial intelligence (GenAI) model;
determining, for each prompt, that the prompt comprises malicious content or elicits undesired model behavior; and
initiating, in response to the determining, at least one remediation action;
wherein, for a first subset of the prompts, the at least one remediation action comprises:
inputting at least a portion of the received data into the first GenAI model to obtain a first output;
inputting at least a portion of the first output along with obfuscation instructions into a second, different GenAI model to obtain a second output; and
returning data characterizing the second output to the requester;
wherein, for a second subset of the prompts, the at least one remediation action comprises:
blocking an Internet Protocol (IP) and/or medium access control address (MAC) address of a corresponding requester from accessing the first GenAI model.
|