US 12,260,626 B2
	Method for re-recognizing object image based on multi-feature information capture and correlation analysis
Xiushan Nie, Jinan (CN); Xue Zhang, Jinan (CN); Chuntao Wang, Jinan (CN); Peng Tao, Jinan (CN); and Xiaofeng Li, Jinan (CN)
Assigned to SHANDONG JIANZHU UNIVERSITY, Jinan (CN)
Filed by SHANDONG JIANZHU UNIVERSITY, Jinan (CN)
Filed on Jul. 29, 2022, as Appl. No. 17/876,585.
Application 17/876,585 is a continuation in part of application No. PCT/CN2022/070929, filed on Jan. 10, 2022.
Claims priority of application No. 202110732494.2 (CN), filed on Jun. 29, 2021.
Prior Publication US 2022/0415027 A1, Dec. 29, 2022
Int. Cl. G06V 10/778 (2022.01); G06V 10/74 (2022.01); G06V 10/75 (2022.01)

CPC G06V 10/778 (2022.01) [G06V 10/751 (2022.01); G06V 10/761 (2022.01)]

3 Claims

1. A method for re-recognizing an object image based on a multi-feature information capture and correlation analysis comprising:

a) collecting a plurality of object images to form an object image re-recognition database, labeling identifier (ID) information of an object image in the object image re-recognition database, and dividing the object image re-recognition database into a training set and a test set;

b) establishing an object image re-recognition model by using the multi-feature information capture and correlation analysis;

c) optimizing an objective function of the object image re-recognition model by using a cross-entropy loss function and a triplet loss function to obtain an optimized object image re-recognition model;

d) marking the object images with the ID information to obtain marked object images, inputting the marked object images into the optimized object image re-recognition model in step c) for training to obtain a trained object image re-recognition model and storing the trained object image re-recognition model;

e) inputting a to-be-retrieved object image into the trained object image re-recognition model in step d) to obtain a feature of a to-be-retrieved object; and

f) comparing the feature of the to-be-retrieved object with features of the object images in the test set and sorting comparison results by a similarity measurement

wherein step b) comprises the following steps:

b-1) setting an image input network to two branch networks comprising a first feature branch network and a second feature branch network;

b-2) inputting an object image h in the training set into the first feature branch network, wherein h∈ custom character

^e×w×3,

represents a real number space, e represents a number of horizontal pixels of the object image h, w represents a number of vertical pixels of the object image h, and 3 represents a number of channels of each red, green, and blue (RGB) image; processing the object image h by using a convolutional layer to obtain a feature map f; processing the feature map f by using a channel attention mechanism; performing a global average pooling and a global maximum pooling on the feature map f to obtain two one-dimensional vectors; normalizing the two one-dimensional vectors through a convolution, a Rectified Linear Unit (ReLU) activation function, a 1*1 convolution, and sigmoid function operations in turn to weight the feature map f to obtain a weighted feature map f; performing a maximum pooling and an average pooling on all channels at each position in the weighted feature map f by using a spatial attention mechanism to obtain a maximum pooled feature map and an average pooled feature map; stitching the maximum pooled feature map and the average pooled feature map to obtain a stitched feature map; performing a 7*7 convolution on the stitched feature map, and then normalizing the stitched feature map by using a batch normalization layer and a sigmoid function to obtain a normalized stitched feature map; and multiplying the normalized stitched feature map by the feature map f to obtain a new feature;

b-3) inputting the object image h in the training set into the second feature branch network, wherein h∈ custom character

^e×w×3; dividing the image h into n two-dimensional blocks; representing embeddings of the two-dimensional blocks as a one-dimensional vector h_l∈ custom character

^n×(p^²^·3)by using a linear transformation layer, wherein P represents a resolution of an image block, and n=ew/p²; calculating an average embedding h_aof all the two-dimensional blocks according to a formula

wherein h_irepresents an embedding of an i^thblock obtained through a Gaussian distribution initialization, and i∈{1, . . . , n}; calculating an attention coefficient a_iof the i^thblock according to a formula a_i=q^Tσ(W₁h₀+W₂h_i+W₃h_a), wherein q^Trepresents a weight, σ represents the sigmoid function, h₀represents a class marker, and W₁, W₂, and W₃are weights; calculating a new embedding h_lof each of the two-dimensional blocks according to a formula

and calculating a new class marker h′₀according to a formula h′₀=W₄[h₀∥h₁], wherein W₄represents a weight;

b-4) taking the new class marker h′₀and a sequence with an input size of h_l∈ custom character

^n×d^_cas an overall representation of a new image, wherein d_c=d*m, d represents a dimension size of a head of each self-attention mechanism in a multi-head attention mechanism, and m represents a number of heads of the multi-head attention mechanism; adding position information in the new image, and then taking the new image as an input of a transformer encoder to complete the establishment of the object image re-recognition model.