US 11,887,354 B2
	Weakly supervised image semantic segmentation method, system and apparatus based on intra-class discriminator
Zhaoxiang Zhang, Beijing (CN); Tieniu Tan, Beijing (CN); Chunfeng Song, Beijing (CN); and Junsong Fan, Beijing (CN)
Assigned to INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES, Beijing (CN)
Appl. No. 17/442,697
Filed by INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES, Beijing (CN)
PCT Filed Jul. 2, 2020, PCT No. PCT/CN2020/099945 § 371(c)(1), (2) Date Sep. 24, 2021, PCT Pub. No. WO2021/243787, PCT Pub. Date Dec. 9, 2021.
Claims priority of application No. 202010506805.9 (CN), filed on Jun. 5, 2020.
Prior Publication US 2022/0180622 A1, Jun. 9, 2022
Int. Cl. G06V 10/40 (2022.01); G06V 10/764 (2022.01); G06T 7/174 (2017.01); G06V 10/774 (2022.01); G06V 20/70 (2022.01); G06V 10/776 (2022.01)

CPC G06V 10/765 (2022.01) [G06T 7/174 (2017.01); G06V 10/40 (2022.01); G06V 10/776 (2022.01); G06V 10/7747 (2022.01); G06V 20/70 (2022.01); G06T 2207/20021 (2013.01); G06T 2207/20081 (2013.01)]

17 Claims

1. A weakly supervised image semantic segmentation method based on an intra-class discriminator, comprising:

extracting a feature image of a to-be-processed image through a feature extraction network, and obtaining an image semantic segmentation result of the to-be-processed image through an image semantic segmentation module, wherein the image semantic segmentation module is obtained through training based on a training image set and corresponding accurate pixel-level class labels;

wherein, the corresponding accurate pixel-level class labels are obtained through a first intra-class discriminator and a second intra-class discriminator based on the training image set and corresponding image-level class labels; the first intra-class discriminator and the second intra-class discriminator are separately constructed based on a deep network, and a method for training the first intra-class discriminator and the second intra-class discriminator comprises:

step S10: extracting a feature image of each image in the training image set through the feature extraction network to obtain a training feature image set, and constructing a first loss function of the first intra-class discriminator and a second loss function of the second intra-class discriminator, respectively;

step S20: training the first intra-class discriminator based on the training feature image set, the corresponding image-level class labels and the first loss function to obtain preliminary pixel-level foreground and background labels corresponding to all classes of each image in the training image set, wherein step S20 further comprises:

step S21: for each image-level class label c of each feature image in the training feature image set, setting a direction vector w_c, using a pixel in a direction of the direction vector w_cas a foreground pixel of a class c, and using a pixel in an opposite direction of the direction vector w_cas a background pixel of the class c;

step S22: calculating a first loss value based on the direction vector w, and the training feature image set, and updating w_cbased on the first loss value; and

step S23: repeatedly performing step S21 and step S22 until a set first quantity of times of training is reached, wherein a trained first intra-class discriminator and the preliminary pixel-level foreground and background labels corresponding to all the classes of each image in the training image set are obtained;

step S30: training the second intra-class discriminator based on the training feature image set, the corresponding preliminary pixel-level foreground and background labels and the second loss function to obtain accurate pixel-level foreground and background labels corresponding to all the classes of each image in the training image set; and

step S40: generating the accurate pixel-level class labels based on the accurate pixel-level foreground and background labels corresponding to all the classes of each image in the training image set and the corresponding image-level class labels.