| CPC G06N 3/045 (2023.01) [G06F 40/40 (2020.01); G06N 3/0475 (2023.01); G06N 3/08 (2013.01)] | 20 Claims |

|
1. A method for constructing layered prompts to evaluate and assess performance of pre-trained large language models, the method comprising:
obtaining a set of application domains of a pre-trained large language model (LLM) in which the pre-trained LLM will be used,
wherein the pre-trained LLM is configured to generate, in response to a received input, a response;
using the set of application domains, determining a set of guidelines defining one or more operation boundaries of the pre-trained LLM by mapping each application domain of the set of application domains to one or more guidelines of the set of guidelines;
determining a set of layers for the pre-trained LLM associated with the one or more guidelines of the set of guidelines,
wherein each layer within the set of layers includes a layer-specific model logic and a set of variables associated with the one or more guidelines of each corresponding layer, wherein the layer-specific model logic includes weights, biases, activation functions, and regulatory or contextual parameters, and
wherein each variable in the set of variables represents an attribute identified within the one or more guidelines of each corresponding layer;
for a first set of one or more layers of the set of layers, constructing a first test case comprising (1) a first layered prompt and (2) a first expected response,
wherein the constructing the first test case comprises transforming the first set of one or more layers of the set of layers using a rule-based engine, wherein the rule-based engine maps the first test case to a first scenario derived from the first set of one or more layers of the set of layers, and performing computations that contributes to an overall decision-making process using each layer-specific model logic in the pre-trained LLM;
wherein the first layered prompt is configured to measure one or more values of a corresponding set of variables of the first set of one or more layers, and
wherein the first test case is configured to test the one or more operation boundaries of corresponding guidelines of the first set of the one or more layers of the set of layers;
executing the first test case to evaluate the pre-trained LLM by:
supplying the first layered prompt into the pre-trained LLM, and
responsive to inputting the first layered prompt, receiving, from the pre-trained LLM, for the first layered prompt, a set of responses;
aggregating the set of responses received from each layer using weights for each layer;
generating an overall result based on the aggregated responses;
generating an indicator of compliance with the guidelines by comparing the first expected response of the first test case to the set of responses generated as an overall result received from the pre-trained LLM, wherein the indicator of compliance reflects specific layers of the first layered prompt, variables of the first layered prompt, or weights assigned to each layer;
using the indicator of compliance with the guidelines generated based on comparison of the first expected response of the first test case to the set of responses received from the pre-trained LLM, dynamically constructing a second test case testing a second set of one or more layers of the set of layers occurring subsequent to the first set of one of more layers of the set of layers,
wherein the constructing the second test case comprises transforming the second set of one or more layers of the set of layers using the rule-based engine, wherein the rule-based engine maps the second test case maps to a second scenario derived from the second set of one or more layers of the set of layers, and performing computations that contributes to an overall decision-making process using each layer-specific model logic in the pre-trained LLM;
wherein the second test case comprises: (1) a second layered prompt and (2) a second expected response, and
wherein the second test case is configured to test the one or more operation boundaries of corresponding guidelines of the second set of the one or more layers of the set of layers;
executing the second test case to evaluate the pre-trained LLM;
generating for display at a graphical user interface (GUI), a graphical layout including a first graphical representation indicating satisfaction of the LLM with the one or more guidelines of the set of guidelines of a corresponding application domain and a second graphical representation indicating the evaluations of the pre-trained LLM by the first test case and the second test case;
responsive to a user input received via the GUI, automatically executing a set of actions to modify one or more parameters of the pre-trained LLM; and
validating satisfaction of the pre-trained LLM with the set of guidelines by executing the first test case to compare the first expected response of the first test case with a second set of responses received from the pre-trained LLM.
|