CPC G06F 40/284 (2020.01) [G06F 21/577 (2013.01); G06F 40/253 (2020.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01); G06F 2221/033 (2013.01)] | 20 Claims |
1. A method for determining the robustness of a natural language processing (NLP) model, the method comprising:
obtaining, by a processing circuitry, a modification trigger for a first NLP model using a test data obtained for the first NLP model;
determining, by the processing circuitry, a modifying token that corresponds to the modification trigger;
generating, by the processing circuitry, a modified test data for the first NLP using the test data, the modification trigger, and the modifying token;
providing, by the processing circuitry, the test data and the modified test data as input to the first NLP model;
for each of the test data and the modified test data provided as input to the first NLP model, obtaining, by the processing circuitry, a corresponding output from the first NLP model;
determining, by the processing circuitry and using a machine learning model, whether the modified test data provided as input to the first NLP model corresponds to an output that does not satisfy a similarity criteria with respect to the output corresponding to the test data;
determining an output changing modification rule based on whether the modified test data corresponds to an output that does not satisfy the similarity criteria with respect to the output corresponding to the test data;
generating a set of instances of modified test data based on a set of instances of test data and the output changing modification rule;
providing a portion of the set of instances of modified test data and the set of instances of test data to a second NLP model;
for each instance of the portion of the set of instances of modified test data and the set of instances of test data provided to the second NLP model, obtaining a corresponding second NLP model output from the second NLP model;
based on the second NLP model output for each of the instances of the portion of the set of instances of modified test data and the set of instances of test data provided to the second NLP model, determining robustness information for the second NLP model; and
causing the robustness information of the second NLP model to be provided.
|