US 11,783,037 B1
	Defense method of deep learning model aiming at adversarial attacks
Jielong Guo, Quanzhou (CN); Xian Wei, Quanzhou (CN); Xuan Tang, Quanzhou (CN); Hui Yu, Quanzhou (CN); Dongheng Shao, Quanzhou (CN); Jianfeng Zhang, Quanzhou (CN); Jie Li, Quanzhou (CN); and Yanhui Huang, Quanzhou (CN)
Assigned to Quanzhou Equipment Manufacturing Research Institute, Quanzhou (CN)
Filed by Quanzhou Equipment Manufacturing Research Institute, Quanzhou (CN)
Filed on May 15, 2023, as Appl. No. 18/317,512.
Claims priority of application No. 202211321718.1 (CN), filed on Oct. 27, 2022.
Int. Cl. G06F 21/56 (2013.01); G06N 3/08 (2023.01); G06F 21/55 (2013.01)

CPC G06F 21/566 (2013.01) [G06F 21/552 (2013.01); G06N 3/08 (2013.01); G06F 2221/034 (2013.01)]

4 Claims

1. A defense method of a deep learning model aiming at adversarial attacks, comprising following steps:

step S1, obtaining an untrained initial deep learning model M₀, initial sample image data D_nused for a training, and adversarial sample image data D_abased on D_n;

step S2, training the model M₀with the data D_nto obtain a model M_n, and training with the data D_nand the data D_ato obtain a model M_a;

step S3, constructing a filter layer, and embedding it in front of an input layer of the model M_nto obtain a new model M_nc, wherein the filter layer is used for spatially transforming input data from the adversarial sample to the initial sample without losing image information;

step S4, training an image classification ability of the model M_ncby using the data D_a, keeping model parameters of a part initially belonged to M_nof the M_ncunchanged in the training, and only training filter layer parameters; in the training, adding a differentiable condition number constraint function to a loss function to limit a two-norm condition number of a weight matrix of the filter layer in addition to using a cross entropy loss function to train the image classification ability of the model; and

step S5, taking the filter layer trained in the step S4 out separately and inserting it in front of an input layer of the model M_ato obtain a deep learning model M_acfor detecting initial images and images with adversarial attack noise, and obtaining a correct image classification;

wherein the differentiable condition number constraint function is as follows:

wherein ∥·∥_Frepresents a Frobenius norm of a matrix; A represents the weight matrix of the filter layer; k represents a smaller value of a length and a width of the weight matrix; v is a small constant to ensure a logarithmic function is meaningful, and its value is greater than 0 and far less than 1.