US 11,783,037 B1
Defense method of deep learning model aiming at adversarial attacks
Jielong Guo, Quanzhou (CN); Xian Wei, Quanzhou (CN); Xuan Tang, Quanzhou (CN); Hui Yu, Quanzhou (CN); Dongheng Shao, Quanzhou (CN); Jianfeng Zhang, Quanzhou (CN); Jie Li, Quanzhou (CN); and Yanhui Huang, Quanzhou (CN)
Assigned to Quanzhou Equipment Manufacturing Research Institute, Quanzhou (CN)
Filed by Quanzhou Equipment Manufacturing Research Institute, Quanzhou (CN)
Filed on May 15, 2023, as Appl. No. 18/317,512.
Claims priority of application No. 202211321718.1 (CN), filed on Oct. 27, 2022.
Int. Cl. G06F 21/56 (2013.01); G06N 3/08 (2023.01); G06F 21/55 (2013.01)
CPC G06F 21/566 (2013.01) [G06F 21/552 (2013.01); G06N 3/08 (2013.01); G06F 2221/034 (2013.01)] 4 Claims
OG exemplary drawing
 
1. A defense method of a deep learning model aiming at adversarial attacks, comprising following steps:
step S1, obtaining an untrained initial deep learning model M0, initial sample image data Dn used for a training, and adversarial sample image data Da based on Dn;
step S2, training the model M0 with the data Dn to obtain a model Mn, and training with the data Dn and the data Da to obtain a model Ma;
step S3, constructing a filter layer, and embedding it in front of an input layer of the model Mn to obtain a new model Mnc, wherein the filter layer is used for spatially transforming input data from the adversarial sample to the initial sample without losing image information;
step S4, training an image classification ability of the model Mnc by using the data Da, keeping model parameters of a part initially belonged to Mn of the Mnc unchanged in the training, and only training filter layer parameters; in the training, adding a differentiable condition number constraint function to a loss function to limit a two-norm condition number of a weight matrix of the filter layer in addition to using a cross entropy loss function to train the image classification ability of the model; and
step S5, taking the filter layer trained in the step S4 out separately and inserting it in front of an input layer of the model Ma to obtain a deep learning model Mac for detecting initial images and images with adversarial attack noise, and obtaining a correct image classification;
wherein the differentiable condition number constraint function is as follows:

OG Complex Work Unit Math
wherein ∥·∥F represents a Frobenius norm of a matrix; A represents the weight matrix of the filter layer; k represents a smaller value of a length and a width of the weight matrix; v is a small constant to ensure a logarithmic function is meaningful, and its value is greater than 0 and far less than 1.