US 12,242,613 B2
Automated evaluation of machine learning models
Mathieu Sinn, Mulhuddart (IE); Beat Buesser, Mulhuddart (IE); Ngoc Minh Tran, Dublin (IE); Killian Levacher, Dublin (IE); and Hessel Tuinhof, Dublin (IE)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Sep. 30, 2020, as Appl. No. 16/948,762.
Prior Publication US 2022/0100867 A1, Mar. 31, 2022
Int. Cl. G06F 21/57 (2013.01); G06N 20/00 (2019.01)
CPC G06F 21/577 (2013.01) [G06N 20/00 (2019.01); G06F 2221/034 (2013.01)] 25 Claims
OG exemplary drawing
 
1. A method for automated evaluation of machine learning models in a computing environment by one or more processors comprising:
receiving a machine learning model, a data set used for testing the machine learning model, one or more adversarial operation objectives, an adversarial threat model, and a selected number of hyperparameters;
automatically evaluating and determining a level of robustness of the machine learning model against an adversarial whitebox operation and analyzing results from the adversarial whitebox operation using the data set, the one or more adversarial operation objectives, the adversarial threat model, and the selected number of hyperparameters, wherein the analyzing includes:
localizing a plurality of masked gradients within a graphical representation of the machine learning model, wherein the plurality of masked gradients is localized using brute-force or AI search on one or more nodes causing gradient masking;
unmasking one or more of the plurality of masked gradients by applying backward-pass differentiable approximations or reparameterizations;
diagnosing gradient information of one or more unmasked gradients and a current adversarial operation loss function;
identifying one or more parameter optimizations based on the diagnosing of the gradient information;
determining a first robustness score for the machine learning model by which the machine learning model is ranked, wherein the first robustness score indicates a level of security of the machine learning model against the adversarial whitebox operation;
generating an evaluation summary, wherein the evaluation summary includes the first robustness score, a provenance of each adversarial sample, and any issues that are recommended to be resolved manually;
generating a modified machine learning model by modifying an unmasking of gradients of the machine learning model by applying the backward-pass differentiable approximations or the reparameterizations by adjusting the one or more parameter optimizations previously identified;
determining a second robustness score for the modified machine learning model by evaluating the level of robustness of the modified machine learning model against the adversarial whitebox operations using partial inputs from a previous evaluation of the machine learning model;
ranking the modified machine learning model according to the level of robustness using the second robustness score, wherein a machine learning operation is initialized to learn and store the level of robustness of the machine learning model against the adversarial whitebox operations based on the machine learning model, the data set used for testing the machine learning model, the one or more adversarial objectives, the adversarial threat model, and a selected number of hyperparameters.