US 12,223,063 B2
End-to-end measurement, grading and evaluation of pretrained artificial intelligence models via a graphical user interface (GUI) systems and methods
James Myers, New York, NY (US); William Franklin Cameron, Jacksonville, FL (US); Miriam Silver, Tel Aviv (IL); Prithvi Narayana Rao, Allen, TX (US); Pramod Goyal, Ahmedabad (IN); and Manjit Rajaretnam, Irving, TX (US)
Assigned to CITIBANK, N.A., , NY (US)
Filed by Citibank, N.A., New York, NY (US)
Filed on Jun. 10, 2024, as Appl. No. 18/739,111.
Application 18/739,111 is a continuation in part of application No. 18/607,141, filed on Mar. 15, 2024.
Application 18/607,141 is a continuation in part of application No. 18/399,422, filed on Dec. 28, 2023.
Application 18/399,422 is a continuation of application No. 18/327,040, filed on May 31, 2023, granted, now 11,874,934, issued on Jan. 16, 2024.
Application 18/327,040 is a continuation in part of application No. 18/114,194, filed on Feb. 24, 2023, granted, now 11,763,006, issued on Sep. 19, 2023.
Application 18/114,194 is a continuation in part of application No. 18/098,895, filed on Jan. 19, 2023, granted, now 11,748,491, issued on Sep. 5, 2023.
Prior Publication US 2024/0411896 A1, Dec. 12, 2024
Int. Cl. G06F 21/57 (2013.01); G06F 21/55 (2013.01)
CPC G06F 21/577 (2013.01) [G06F 21/552 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for grading a pre-trained Artificial Intelligence (AI) model via a graphical user interface (GUI), the method comprising:
obtaining a set of application domains of the pre-trained AI model in which a pre-trained AI model will be used,
wherein the pre-trained AI model is configured to generate, in response to a received input, a response;
using the set of application domains, determining a set of guidelines defining one or more operation boundaries of the pre-trained AI model by mapping each application domain of the set of application domains to one or more guidelines of the set of guidelines;
generating a set of test categories associated with the one or more guidelines of the set of guidelines,
wherein each of the set of test categories includes a set of benchmarks,
wherein each benchmark in the set of benchmarks is configured to indicate a degree of satisfaction of the pre-trained AI model with the one or more guidelines associated with a corresponding test category
wherein the set of test categories include at least two of: quality of training data of the pre-trained AI model, security measures of the pre-trained AI model, software development practices of the pre-trained AI model, satisfaction with regulations of the pre-trained AI model, and explainability of the response of the pre-trained AI model;
for each test categories in the set of test categories, constructing a set of tests,
wherein each test comprises: (1) a prompt and (2) an expected response,
wherein each test is configured to test the degree of satisfaction of the pre-trained AI model with the one or more guidelines associated with the corresponding test category;
for each test of the sets of tests, obtaining, from the pre-trained AI model, a set of case-specific responses by:
transmitting the prompt of the test into one or more nodes of an input layer of the pre-trained AI model, wherein the one or more nodes are associated with the corresponding test category, and
responsive to transmitting the prompt, receiving, from an output layer of the pre-trained AI model, a case-specific response;
using the obtained sets of case-specific responses, assigning a grade, for each test category, to the pre-trained AI model in accordance with the set of benchmarks for the corresponding test category by, for each test:
comparing the expected response of the test to the case-specific response received from the pre-trained AI model;
using the assigned grades, mapping the assigned grades for each test category to a particular degree of satisfaction corresponding to one or more application domains of the pre-trained AI model; and
generating for display at the GUI, a graphical layout indicating application-domain-specific grades, wherein the graphical layout includes a first graphical representation of each application domain of the pre-trained AI model and a second graphical representation of corresponding application-domain-specific grades.