US 12,444,492 B2
	Medical image segmentation method based on Boosting-Unet segmentation network
Qi Ye, Guangdong (CN); Lihui Wen, Guangdong (CN); Jiawei Chen, Guangdong (CN); and Chihua Fang, Guangdong (CN)
Assigned to SOUTH CHINA NORMAL UNIVERSITY, Guangdong (CN)
Filed by SOUTH CHINA NORMAL UNIVERSITY, Guangdong (CN)
Filed on Mar. 14, 2023, as Appl. No. 18/183,931.
Claims priority of application No. 202210502143.7 (CN), filed on May 10, 2022.
Prior Publication US 2023/0368896 A1, Nov. 16, 2023
Int. Cl. G06T 7/10 (2017.01); G06T 3/4007 (2024.01); G16H 30/40 (2018.01)

CPC G16H 30/40 (2018.01) [G06T 3/4007 (2013.01); G06T 7/10 (2017.01); G06T 2207/20084 (2013.01)]

1 Claim

1. A medical image segmentation method based on a Boosting-Unet segmentation network, comprising following steps:

S1: acquiring a pancreas cancer-related CT slice image data set, pre-processing the data set, and dividing the data set into a training set and a validation set, wherein the data set includes original medical images and corresponding known medical images with labeled segmentation;

S2: planning a number n of layers of an overall segmentation network according to a scale of the data set in S1, constructing single-layered segmentation networks, and constructing the overall segmentation network by utilizing the single-layered segmentation networks, wherein the overall segmentation network is an instance of the Boosting-Unet segmentation network;

S3: dividing the overall segmentation network obtained in S2 into m sub segmentation networks, where m is an integer greater than or equal to 1, and training the m sub segmentation networks by utilizing the training set; and

S4: inputting an original medical image of the validation set into the trained overall segmentation network obtained in S3, outputting image data results with labeled segmentation, and comparatively selecting an optimum of the trained overall segmentation network based on the image data results;

in the S1, the pre-processing the data set comprises:

I: truncating and normalizing each of medical image data in the data set, with a specific formula below:

wherein V is single medical image data in the data set, a is a minimum value in the single medical image data, b is a maximum value in the single medical image data, and V is medical image data re-generated by truncating and normalizing the single medical image data; the data set comprises the original medical image and a medical image with labeled segmentation corresponding the original medical image;

in the S2, each of the single-layered segmentation networks is an i^thlayer of the overall segmentation network, wherein 1≤i≤n, each of the single-layered segmentation networks comprises encoding blocks and decoding blocks, the encoding blocks perform feature data extraction on data inputted in the encoding blocks, the decoding blocks output image data labeled with segmentation, the overall segmentation network is constructed by utilizing the single-layered segmentation networks, the inputted image data is inputted to encoding blocks of a 1^stlayer, the encoding blocks of the 1^stlayer to an n^thlayer are unidirectionally connected in series together successively in an output-input order to form an encoding path, outputs of encoding blocks of the n^thlayer are unidirectionally connected to inputs of the decoding blocks of the n^thlayer, and the decoding blocks of the n^thlayer to the 1^stlayer unidirectionally connected in series together successively in an output-input order to form a decoding path, and finally, the decoding blocks of the 1^stlayer output the medical image data with labeled segmentation, and in addition, the encoding blocks in same layer are to be in skip connection to the decoding blocks;

the constructing encoding blocks of the i^thlayer, wherein 1≤i≤n, of the overall segmentation network comprises following steps:

I: determining a number of needed convolution kernels, initializing the convolution kernels, and selecting an activation function and a pooling operation, specifically comprising following operations:

selecting two convolution kernels, wherein parameters of each convolution kernel of the two convolution kernels are 3×3×3, i.e., three-dimensional matrix data with a height of 3, a width of 3 and a channel number of 3, and initializing the parameters of the convolution kernel in form of a random decimal matrix; setting the number of layers of the convolution kernel to be n×32, wherein n represents an n^thlayer of the overall segmentation network, setting a moving step length to be a pixel step length, and the moving step length uses zero-padding for invariable size of the outputted image data; and selecting a rectified linear unit (Relu) function as the activation function and selecting max pooling with a kernel of 2×2 as the pooling operation;

II: establishing a feature extraction flow for the inputted image data;

the feature extraction flow for the inputted image data in the step II specifically comprises following steps:

1 Performing a convolutional operation on the inputted image data of the i^thlayer and the convolution kernel to extract feature data;

2 Performing nonlinear function fitting activation on the extracted feature data by using the Relu function; and

3 Reducing a size of the feature data by means of max pooling after activation;

repeating the feature extraction flow according to quantity of the encoding layers;

constructing the decoding blocks of the 1^stlayer of the overall segmentation network comprises following steps:

I: determining the number of needed convolution kernels, initializing the convolution kernels, and selecting an upsampling method and an activation function, specifically comprising following operations:

selecting four convolution kernels, wherein parameters of each of two convolution kernels of four convolution kernels are 3×3×3, i.e., three-dimensional matrix data with a height of 3, a width of 3 and a channel number of 3, and initializing the parameters of the convolution kernels in form of a random decimal matrix; parameters of each of another two convolution kernels of four convolution kernels are 1×1×1, i.e., three-dimensional matrix data with a height of 1, a width of 1 and a channel number of 1, and initializing the parameters of the convolution kernels in form of a random decimal matrix; setting the number of layers of each of the two 3×3×3 convolution kernels to be n×32, wherein n represents the n^thlayer of the overall segmentation network, setting the number of layers of each of the two 1×1×1 convolution kernels to be 2, setting a moving step length to be a pixel step length, and the moving step length uses zero-padding for invariable size of the outputted image data; and selecting trilinear interpolation as a method of the upsampling, and selecting a Sigmoid function as the activation function;

II: establishing a processing flow for the inputted encoded feature data;

the processing flow for the inputted encoded feature data in the step II specifically comprises following steps:

1 Splicing the feature data with the feature data extracted by the encoding blocks in the same layer through skip connection to obtain the feature data of data that is two times of a number of original channels;

2 Performing a continuous convolutional operation on the inputted image data of the 1^stlayer and the 3×3×3 convolution kernel and the 1×1×1 convolution kernel to extract new feature data;

3 Performing nonlinear function fitting activation on the extracted feature data by using a Sigmoid function; and

4 Upsampling the feature data by means of trilinear interpolation: doubling a three-dimensional data matrix of the feature extracted by the encoding blocks, calculating and completing edge data of a matrix first by means of monolinear interpolation, and then selecting any dimensional direction to calculate and complete data in the matrix by means of monolinear interpolation;

repeating the processing flow according to a quantity of convolution kernels of a same type as step I of constructing the decoding blocks of the 1^stlayer;

constructing the decoding blocks of the i^thlayer, wherein 1<i≤n, of the overall segmentation network comprises following steps:

selecting two convolution kernels, wherein parameters of each convolution kernel are 3×3×3, i.e., three-dimensional matrix data with a height of 3, a width of 3 and a channel number of 3, and initializing the parameter of the convolution kernel in form of a random decimal matrix; setting the number of layers of the convolution kernel to be n×32, wherein n represents the n^thlayer of the overall segmentation network, setting a moving step length to be a pixel step length, and the moving step length uses zero-padding for invariable size of the outputted image data; and selecting trilinear interpolation as a method of upsampling, and selecting a Relu function as the activation function;

II: establishing a processing flow for the inputted encoded feature data;

the processing flow for the inputted encoded feature data in the step II specifically comprises the following steps:

2 Performing a convolutional operation on the inputted image data of the i^thlayer and the convolution kernel to extract new feature data;

3 Performing nonlinear function fitting activation on the extracted feature data by using the Relu function; and

4 Upsampling the feature data by means of trilinear interpolation: doubling the three-dimensional data matrix of the feature extracted by the encoding blocks, calculating and completing edge data of a matrix first by means of monolinear interpolation, and then selecting any dimensional direction to calculate and complete data in the matrix by means of monolinear interpolation;

In the S3, dividing the overall segmentation network into m sub segmentation networks specifically comprises two ways: I, directly using the single-layered segmentation network of an i^thlayer as a k^thsub segmentation network, wherein i is layer index and i is an integer, wherein k is an integer index for the sub segmentation networks, and wherein 1≤i≤n, 1≤k≤m; II, serially connecting and combining the segmentation networks of the i^thlayer to the j^thlayer as the k^thsub segmentation network, wherein i and i are both the layer index, wherein k is an integer index for the sub segmentation networks, and wherein 1≤i<j≤n, 1≤k≤m; training the overall segmentation network comprises sequentially training a 1^stsub segmentation network to a m^thsub segmentation network, wherein since it is necessary to input the original image data to the encoding blocks of the 1^stlayer according to a specific path for training the sub segmentation networks, the image data with labeled segmentation is outputted by the decoding blocks of the 1^stlayer finally, and therefore, the k^thsub segmentation network needs to combine the 1^stsub segmentation network to the k^thsub segmentation network together to train, specifically comprises following steps:

I: training the (k−1)^thsub segmentation network;

II: fixing the parameters of the convolution kernels of the trained (k−1)^thsub segmentation network; and

III: training the k^thsub segmentation network till training of all sub segmentation networks are completed;

wherein training the k^thsub segmentation network specifically comprises:

with the parameters of the convolution kernels of the trained previous (k−1) sub segmentation networks being fixed, inputting the original medical image data of the training set into the encoding blocks of the sub segmentation network of the 1^stlayer to complete an encoding block-decoding block path of the sub segmentation network, and finally, outputting, by the decoding blocks of the sub segmentation network of the 1^stlayer, the image data with labeled segmentation, calculating a loss function value according to an output result and the corresponding known image data with labeled segmentation, and optimizing the parameters of the convolution kernels of the k^thsub segmentation network by means of a gradient back-propagation algorithm by utilizing a calculation result; and continuously iterating the process till a related result of the loss function is optimum, and using the parameters of the convolution kernels at the time as the optimum parameters of the convolution kernels of the k^thsub segmentation network.