US 12,293,191 B2
	Fine-grained image recognition method and apparatus using graph structure represented high-order relation discovery
Jia Li, Beijing (CN); Yifan Zhao, Beijing (CN); Dingfeng Shi, Beijing (CN); and Qinping Zhao, Beijing (CN)
Assigned to BEIHANG UNIVERSITY, Beijing (CN)
Filed by BEIHANG UNIVERSITY, Beijing (CN)
Filed on Dec. 9, 2021, as Appl. No. 17/546,993.
Claims priority of application No. 202110567940.9 (CN), filed on May 24, 2021.
Prior Publication US 2022/0382553 A1, Dec. 1, 2022
Int. Cl. G06F 9/38 (2018.01); G06F 9/30 (2018.01); G06N 3/02 (2006.01)

CPC G06F 9/3836 (2013.01) [G06F 9/30036 (2013.01); G06N 3/02 (2013.01)]

20 Claims

1. A fine-grained image recognition method using graph structure represented high-order relation discovery, comprising:

inputting an image to be classified into a convolutional neural network feature extractor with multiple stages, and extracting two layers of network feature graphs X_iand Y_iin the last stage;

constructing a hybrid high-order attention module enhanced by a space-gated network according to the network feature graphs X_iand Y_iand forming a high-order feature vector pool according to the hybrid high-order attention module;

using each vector in the high-order feature vector pool as a node to construct a graph neural network, and utilizing semantic similarity among high-order features to form representative vector nodes in groups; and

performing global pooling on the representative vector nodes to obtain classification vectors, and obtaining a fine-grained classification result through a fully connected layer and a classifier based on the classification vectors.

10. A fine-grained image recognition device using graph structure represented high-order relation discovery, comprising:

at least one processor; and

a memory storing computer-executable instructions, wherein the computer-executable instructions are executed by the at least one processor, to enable the at least one processor to:

input an image to be classified into a convolutional neural network feature extractor with multiple stages, and extract two layers of network feature graphs X_iand Y_iin the last stage;

construct a hybrid high-order attention module enhanced by a space-gated network according to the two layers of network feature graphs X_iand Y_i, and form a high-order feature vector pool according to the hybrid high-order attention module;

use each vector in the high-order feature vector pool as a node to construct a graph neural network, and utilize a semantic similarity among high-order features to form representative vector nodes in groups; and

perform global pooling on the representative vector nodes to obtain classification vectors, and obtain a fine-grained classification result through a fully connected layer and a classifier based on the classification vectors.

19. A non-transitory computer-readable storage medium storing a computer program, wherein the computer program is executed by a computer to:

input an image to be classified into a convolutional neural network feature extractor with multiple stages, and extract two layers of network feature graphs X_iand Y_iin the last stage;