US 11,887,354 B2
Weakly supervised image semantic segmentation method, system and apparatus based on intra-class discriminator
Zhaoxiang Zhang, Beijing (CN); Tieniu Tan, Beijing (CN); Chunfeng Song, Beijing (CN); and Junsong Fan, Beijing (CN)
Assigned to INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES, Beijing (CN)
Appl. No. 17/442,697
Filed by INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES, Beijing (CN)
PCT Filed Jul. 2, 2020, PCT No. PCT/CN2020/099945
§ 371(c)(1), (2) Date Sep. 24, 2021,
PCT Pub. No. WO2021/243787, PCT Pub. Date Dec. 9, 2021.
Claims priority of application No. 202010506805.9 (CN), filed on Jun. 5, 2020.
Prior Publication US 2022/0180622 A1, Jun. 9, 2022
Int. Cl. G06V 10/40 (2022.01); G06V 10/764 (2022.01); G06T 7/174 (2017.01); G06V 10/774 (2022.01); G06V 20/70 (2022.01); G06V 10/776 (2022.01)
CPC G06V 10/765 (2022.01) [G06T 7/174 (2017.01); G06V 10/40 (2022.01); G06V 10/776 (2022.01); G06V 10/7747 (2022.01); G06V 20/70 (2022.01); G06T 2207/20021 (2013.01); G06T 2207/20081 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A weakly supervised image semantic segmentation method based on an intra-class discriminator, comprising:
extracting a feature image of a to-be-processed image through a feature extraction network, and obtaining an image semantic segmentation result of the to-be-processed image through an image semantic segmentation module, wherein the image semantic segmentation module is obtained through training based on a training image set and corresponding accurate pixel-level class labels;
wherein, the corresponding accurate pixel-level class labels are obtained through a first intra-class discriminator and a second intra-class discriminator based on the training image set and corresponding image-level class labels; the first intra-class discriminator and the second intra-class discriminator are separately constructed based on a deep network, and a method for training the first intra-class discriminator and the second intra-class discriminator comprises:
step S10: extracting a feature image of each image in the training image set through the feature extraction network to obtain a training feature image set, and constructing a first loss function of the first intra-class discriminator and a second loss function of the second intra-class discriminator, respectively;
step S20: training the first intra-class discriminator based on the training feature image set, the corresponding image-level class labels and the first loss function to obtain preliminary pixel-level foreground and background labels corresponding to all classes of each image in the training image set, wherein step S20 further comprises:
step S21: for each image-level class label c of each feature image in the training feature image set, setting a direction vector wc, using a pixel in a direction of the direction vector wc as a foreground pixel of a class c, and using a pixel in an opposite direction of the direction vector wcas a background pixel of the class c;
step S22: calculating a first loss value based on the direction vector w, and the training feature image set, and updating wc based on the first loss value; and
step S23: repeatedly performing step S21 and step S22 until a set first quantity of times of training is reached, wherein a trained first intra-class discriminator and the preliminary pixel-level foreground and background labels corresponding to all the classes of each image in the training image set are obtained;
step S30: training the second intra-class discriminator based on the training feature image set, the corresponding preliminary pixel-level foreground and background labels and the second loss function to obtain accurate pixel-level foreground and background labels corresponding to all the classes of each image in the training image set; and
step S40: generating the accurate pixel-level class labels based on the accurate pixel-level foreground and background labels corresponding to all the classes of each image in the training image set and the corresponding image-level class labels.