US 12,444,158 B2
Image semantic segmentation algorithm and system based on multi-channel deep weighted aggregation
Yongsheng Qi, Hohhot (CN); Peiliang Chen, Hohhot (CN); Liqiang Liu, Hohhot (CN); Yongting Li, Hohhot (CN); and Jianqiang Su, Hohhot (CN)
Assigned to INNER MONGOLIA UNIVERSITY OF TECHNOLOGY, Hohhot (CN)
Filed by INNER MONGOLIA UNIVERSITY OF TECHNOLOGY, Hohhot (CN)
Filed on Feb. 3, 2023, as Appl. No. 18/163,918.
Claims priority of application No. 202210123937.2 (CN), filed on Feb. 10, 2022.
Prior Publication US 2023/0316699 A1, Oct. 5, 2023
Int. Cl. G06V 10/26 (2022.01); G06V 10/40 (2022.01); G06V 10/776 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01)
CPC G06V 10/26 (2022.01) [G06V 10/40 (2022.01); G06V 10/776 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01)] 7 Claims
OG exemplary drawing
 
1. A computer-implemented method for performing image semantic segmentation in complex real-world environments based on multi-channel deep weighted aggregation, comprising:
executing, by one or more processors, instructions stored on a non-transitory computer-readable storage medium, wherein the instructions cause the processor(s) to performing follow operations:
S1, semantic features with definite class information in an image, transition semantic features between low-level semantic and high-level semantic, and semantic features of context logic relationship in the image are extracted by a low-level semantic channel, an auxiliary semantic channel and a high-level semantic channel, respectively;
S2, three different semantic features obtained in S1 are fused by weighted aggregation to obtain global semantic information of the image;
S3, the semantic features output from respective semantic channels in S1 and the global semantic information in S2 are used to compute loss function for training, wherein, in S1:
a shallow convolution structure network is used to construct the low-level semantic channel for extracting low-level semantic information, a depthwise separable convolution structure network is used to construct an auxiliary semantic channel, and transition semantic information obtained from the auxiliary semantic channel is fed back to the high-level semantic channel;
a deep convolution structure network is used to construct the high-level semantic channel for extracting high-level semantic information; and a process of extracting the low-level semantic information by the shallow convolution structure network includes:
LS(IH*W))=S3(S2(IH*W)));
wherein, LS(IH*W) is a extraction process of the low-level semantic information, IH*W is input image array, and S is a convolution stride.