US 12,260,626 B2
Method for re-recognizing object image based on multi-feature information capture and correlation analysis
Xiushan Nie, Jinan (CN); Xue Zhang, Jinan (CN); Chuntao Wang, Jinan (CN); Peng Tao, Jinan (CN); and Xiaofeng Li, Jinan (CN)
Assigned to SHANDONG JIANZHU UNIVERSITY, Jinan (CN)
Filed by SHANDONG JIANZHU UNIVERSITY, Jinan (CN)
Filed on Jul. 29, 2022, as Appl. No. 17/876,585.
Application 17/876,585 is a continuation in part of application No. PCT/CN2022/070929, filed on Jan. 10, 2022.
Claims priority of application No. 202110732494.2 (CN), filed on Jun. 29, 2021.
Prior Publication US 2022/0415027 A1, Dec. 29, 2022
Int. Cl. G06V 10/778 (2022.01); G06V 10/74 (2022.01); G06V 10/75 (2022.01)
CPC G06V 10/778 (2022.01) [G06V 10/751 (2022.01); G06V 10/761 (2022.01)] 3 Claims
OG exemplary drawing
 
1. A method for re-recognizing an object image based on a multi-feature information capture and correlation analysis comprising:
a) collecting a plurality of object images to form an object image re-recognition database, labeling identifier (ID) information of an object image in the object image re-recognition database, and dividing the object image re-recognition database into a training set and a test set;
b) establishing an object image re-recognition model by using the multi-feature information capture and correlation analysis;
c) optimizing an objective function of the object image re-recognition model by using a cross-entropy loss function and a triplet loss function to obtain an optimized object image re-recognition model;
d) marking the object images with the ID information to obtain marked object images, inputting the marked object images into the optimized object image re-recognition model in step c) for training to obtain a trained object image re-recognition model and storing the trained object image re-recognition model;
e) inputting a to-be-retrieved object image into the trained object image re-recognition model in step d) to obtain a feature of a to-be-retrieved object; and
f) comparing the feature of the to-be-retrieved object with features of the object images in the test set and sorting comparison results by a similarity measurement
wherein step b) comprises the following steps:
b-1) setting an image input network to two branch networks comprising a first feature branch network and a second feature branch network;
b-2) inputting an object image h in the training set into the first feature branch network, wherein h∈custom charactere×w×3, custom character represents a real number space, e represents a number of horizontal pixels of the object image h, w represents a number of vertical pixels of the object image h, and 3 represents a number of channels of each red, green, and blue (RGB) image; processing the object image h by using a convolutional layer to obtain a feature map f; processing the feature map f by using a channel attention mechanism; performing a global average pooling and a global maximum pooling on the feature map f to obtain two one-dimensional vectors; normalizing the two one-dimensional vectors through a convolution, a Rectified Linear Unit (ReLU) activation function, a 1*1 convolution, and sigmoid function operations in turn to weight the feature map f to obtain a weighted feature map f; performing a maximum pooling and an average pooling on all channels at each position in the weighted feature map f by using a spatial attention mechanism to obtain a maximum pooled feature map and an average pooled feature map; stitching the maximum pooled feature map and the average pooled feature map to obtain a stitched feature map; performing a 7*7 convolution on the stitched feature map, and then normalizing the stitched feature map by using a batch normalization layer and a sigmoid function to obtain a normalized stitched feature map; and multiplying the normalized stitched feature map by the feature map f to obtain a new feature;
b-3) inputting the object image h in the training set into the second feature branch network, wherein h∈custom charactere×w×3; dividing the image h into n two-dimensional blocks; representing embeddings of the two-dimensional blocks as a one-dimensional vector hlcustom charactern×(p2·3) by using a linear transformation layer, wherein P represents a resolution of an image block, and n=ew/p2; calculating an average embedding ha of all the two-dimensional blocks according to a formula

OG Complex Work Unit Math
 wherein hi represents an embedding of an ith block obtained through a Gaussian distribution initialization, and i∈{1, . . . , n}; calculating an attention coefficient ai of the ith block according to a formula ai=qTσ(W1h0+W2hi+W3ha), wherein qT represents a weight, σ represents the sigmoid function, h0 represents a class marker, and W1, W2, and W3 are weights; calculating a new embedding hl of each of the two-dimensional blocks according to a formula

OG Complex Work Unit Math
 and calculating a new class marker h′0 according to a formula h′0=W4[h0∥h1], wherein W4 represents a weight;
b-4) taking the new class marker h′0 and a sequence with an input size of hlcustom charactern×dc as an overall representation of a new image, wherein dc=d*m, d represents a dimension size of a head of each self-attention mechanism in a multi-head attention mechanism, and m represents a number of heads of the multi-head attention mechanism; adding position information in the new image, and then taking the new image as an input of a transformer encoder to complete the establishment of the object image re-recognition model.