US 12,106,546 B2
	Image classification method for maximizing mutual information, device, medium and system
Wenrui Dai, Shanghai (CN); Yaoming Wang, Shanghai (CN); Yuchen Liu, Shanghai (CN); Chenglin Li, Shanghai (CN); Junni Zou, Shanghai (CN); and Hongkai Xiong, Shanghai (CN)
Assigned to SHANGHAI JIAO TONG UNIVERSITY, Shanghai (CN)
Filed by SHANGHAI JIAO TONG UNIVERSITY, Shanghai (CN)
Filed on Apr. 1, 2024, as Appl. No. 18/623,054.
Application 18/623,054 is a continuation of application No. PCT/CN2022/116045, filed on Aug. 31, 2022.
Claims priority of application No. 202111170350.9 (CN), filed on Oct. 8, 2021.
Prior Publication US 2024/0242480 A1, Jul. 18, 2024
Int. Cl. G06V 10/00 (2022.01); G06V 10/764 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/70 (2022.01)

CPC G06V 10/765 (2022.01) [G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/70 (2022.01)]

6 Claims

1. An image classification method for maximizing mutual information, comprising:

acquiring a training image;

maximizing mutual information between the training image and a neural network architecture, and automatically determining the neural network architecture and parameters of the neural network;

processing image data to be classified using the obtained neural network to obtain an image classification result; and

dividing the acquired training image into two parts; wherein the maximizing the mutual information between the training image and the neural network architecture, and automatically determining the neural network architecture and parameters of the neural network, comprises:

constructing a super-network and an architecture-generating network, respectively performing data processing thereon to obtain network parameters of the super-network and parameters of the architecture-generating network, and constructing a target network; and

inputting all training images into the target network, generating a predicted image category label, and according to the predicted image category label and a real image category label, calculating a cross entropy loss of the image classification, and training the target network until convergence for image classification;

wherein the constructing a super-network and an architecture-generating network, respectively performing data processing thereon to obtain a network parameter of the super-network and parameters of the architecture-generating network, and constructing a target network comprising:

S1: constructing cells based on all possible image classification operations,

constructing a super-network with the cells, wherein

the super-network is formed by stacking the cells containing all possible image classification operations;

S2: constructing an architecture-generating network based on a convolution neural network, sampling from a standard Gaussian distribution to obtain a sampling value as an input of the architecture-generating network, and obtaining an output of the architecture-generating network through forward propagation;

sampling a noise from the standard Gaussian distribution; and

summing an output of the architecture-generating network and the sampled noise as an architecture parameter of the super-network;

S3: inputting a first part of the training image into the super-network to generate a prediction category label;

calculating image classification cross entropy loss according to the prediction category label and the real category label; and

updating the network parameter of the super-network according to the image classification cross entropy loss with a gradient descent method;

S4: inputting a second part of the training image into the super-network, maximizing the mutual information of the image data and the architecture parameter of the super-network, and determining a lower bound of the mutual information, wherein

the lower bound of the mutual information is the cross entropy loss of the posterior distribution of the architecture parameter and the posterior distribution of the image data, the cross entropy loss is calculated, and the parameters of the architecture-generating network are updated with the gradient descent method; and

repeating S2-S4 to iteratively update the network parameter of the super-network and the parameters of the architecture-generating network continuously until convergence, and stacking the updated new cells to construct a target network.