US 12,393,404 B2
Sample-difference-based method and system for interpreting deep-learning model for code classification
Zhen Li, Wuhan (CN); Ruqian Zhang, Wuhan (CN); Deqing Zou, Wuhan (CN); Hai Jin, Wuhan (CN); and Yangrui Li, Wuhan (CN)
Assigned to Huazhong University of Science and Technology, Wuhan (CN)
Filed by Huazhong University of Science and Technology, Wuhan (CN)
Filed on Sep. 27, 2023, as Appl. No. 18/475,447.
Claims priority of application No. 202211612114.2 (CN), filed on Dec. 9, 2022.
Prior Publication US 2024/0192929 A1, Jun. 13, 2024
Int. Cl. G06F 8/35 (2018.01); G06F 8/41 (2018.01)
CPC G06F 8/35 (2013.01) [G06F 8/42 (2013.01)] 8 Claims
OG exemplary drawing
 
1. A sample-difference-based method for interpreting a deep-learning model for code classification, the method comprising:
off-line training an interpreter comprising:
constructing code transformation for every code sample in a training set to generate difference samples; generating the difference samples through feature deletion and inputting samples before and after the feature deletion into the model to be interpreted to acquire prediction confidence levels corresponding to tags in the training set, calculating a difference between the prediction confidence levels of two samples as feature importance scores; generating the difference samples through extracting code snippets and inputting the code snippet made using a feature immediately before a feature to be calculated as its snipping point and the code snippet made using the feature to be calculated as its snipping point into the model to be interpreted to acquire the prediction confidence levels corresponding to the tags, calculating the difference between the prediction confidence levels of two samples as the feature importance scores; and inputting the original samples, the difference samples and the feature importance scores into a neural network, constructing a deep neural network framework for the interpreter and two approximators, defining a loss function, then fixing the two approximators and training the interpreter, then fixing the interpreter and training the two approximators, and circularly iterating the training until loss convergence, so as to eventually obtain a trained interpreter; and
on-line interpreting the code samples comprising:
using the trained interpreter to extract important features from snippets of object code samples, then using an influence-function-based method to calculate an effect of removing a training sample in the training set on loss of prediction sample, so as to identify training samples in the training set that are most contributive to prediction for the training sample, comparing the extracted important features and the most contributive training samples, and generating interpretation results for the object code samples,
wherein the constructing code transformation to generate the difference samples comprises:
for the input code samples, scanning all code transformation points that meet transformation requirements, generating a plurality of transformation vectors of corresponding dimensions, conducting the code transformation with the generated transformation vectors, screening the generated difference samples, and deleting the samples that do not meet expectations so as to get a difference sample set.