CPC G06F 11/3684 (2013.01) [G06F 11/3688 (2013.01); G06F 11/3692 (2013.01); G06N 5/045 (2013.01)] | 18 Claims |
1. A computing system comprising:
at least one processor; and
one or more non-transitory computer-readable media storing instructions, which when executed by at least one processor, perform operations comprising:
obtaining a set of guidelines defining one or more operation boundaries of an AI application,
wherein the AI application is configured to generate, in response to a received input, an outcome and an explanation of the outcome;
constructing a set of test cases associated with each guideline in the set of guidelines,
wherein each test case comprises: (1) a prompt, (2) an expected outcome, and (3) an expected explanation, and
wherein each test case is configured to test the one or more operation boundaries of the guidelines;
evaluating the AI application against the set of test cases to determine compliance of the AI application with the set of guidelines by:
supplying a prompt of a particular test case into the AI application,
receiving, from the AI application, a case-specific outcome and a corresponding case-specific explanation of the case-specific outcome, and
comparing 1) the expected outcome of the particular test case to the case-specific outcome received from the AI application and 2) the expected explanation of the particular test case to the corresponding case-specific explanation of the case-specific outcome;
using the evaluation, generating a compliance indicator of the AI application indicating compliance of the AI application with the set of guidelines; and
automatically adjusting parameters of the AI application based on the evaluation and the compliance indicator,
wherein adjusting the parameters of the AI application aligns the AI application within the one or more operation boundaries of the set of guidelines.
|