CPC G06F 21/566 (2013.01) [G06F 21/552 (2013.01); G06N 3/08 (2013.01); G06F 2221/034 (2013.01)] | 4 Claims |
1. A defense method of a deep learning model aiming at adversarial attacks, comprising following steps:
step S1, obtaining an untrained initial deep learning model M0, initial sample image data Dn used for a training, and adversarial sample image data Da based on Dn;
step S2, training the model M0 with the data Dn to obtain a model Mn, and training with the data Dn and the data Da to obtain a model Ma;
step S3, constructing a filter layer, and embedding it in front of an input layer of the model Mn to obtain a new model Mnc, wherein the filter layer is used for spatially transforming input data from the adversarial sample to the initial sample without losing image information;
step S4, training an image classification ability of the model Mnc by using the data Da, keeping model parameters of a part initially belonged to Mn of the Mnc unchanged in the training, and only training filter layer parameters; in the training, adding a differentiable condition number constraint function to a loss function to limit a two-norm condition number of a weight matrix of the filter layer in addition to using a cross entropy loss function to train the image classification ability of the model; and
step S5, taking the filter layer trained in the step S4 out separately and inserting it in front of an input layer of the model Ma to obtain a deep learning model Mac for detecting initial images and images with adversarial attack noise, and obtaining a correct image classification;
wherein the differentiable condition number constraint function is as follows:
wherein ∥·∥F represents a Frobenius norm of a matrix; A represents the weight matrix of the filter layer; k represents a smaller value of a length and a width of the weight matrix; v is a small constant to ensure a logarithmic function is meaningful, and its value is greater than 0 and far less than 1.
|