US 12,229,651 B2
Block-based inference method for memory-efficient convolutional neural network implementation and system thereof
Chao-Tsung Huang, Hsinchu (TW)
Assigned to NATIONAL TSING HUA UNIVERSITY, Hsinchu (TW)
Filed by NATIONAL TSING HUA UNIVERSITY, Hsinchu (TW)
Filed on Oct. 6, 2020, as Appl. No. 17/064,561.
Claims priority of provisional application 62/912,630, filed on Oct. 8, 2019.
Claims priority of application No. 109130493 (TW), filed on Sep. 4, 2020.
Prior Publication US 2021/0103793 A1, Apr. 8, 2021
Int. Cl. G06N 3/04 (2023.01); G06N 5/04 (2023.01)
CPC G06N 3/04 (2013.01) [G06N 5/04 (2013.01)] 12 Claims
OG exemplary drawing
 
1. A block-based inference method for a memory-efficient convolutional neural network implementation, which is performed to process an input image, and the block-based inference method for the memory-efficient convolutional neural network implementation comprising:
performing a parameter setting step to set an inference parameter group, wherein the inference parameter group comprises a depth, a block width, a block height and a plurality of layer kernel sizes;
performing a dividing step to drive a processing unit to divide the input image into a plurality of input block data according to the depth, the block width, the block height and the layer kernel sizes, wherein each of the input block data has an input block size;
performing a block-based inference step to drive the processing unit to execute a multi-layer convolution operation on each of the input block data to generate an output block data, wherein the multi-layer convolution operation comprises:
performing a first direction data selecting step to select a plurality of ith layer recomputing features according to a position of the output block data along a scanning line feed direction, and then select an ith layer recomputing input feature block data according to the position of the output block data and the ith layer recomputing features, wherein i is one of a plurality of positive integers from 1 to the depth;
performing a second direction data selecting step to select a plurality of ith layer reusing features according to the ith layer recomputing input feature block data along a block scanning direction, and then combine the ith layer recomputing input feature block data with the ith layer reusing features to generate an ith layer reusing input feature block data; and
performing a convolution operation step to select a plurality of ith layer sub-block input feature groups from the ith layer reusing input feature block data according to an ith layer kernel size, and then execute a convolution operation on each of the ith layer sub-block input feature groups and a convolution parameter group to generate each of a plurality of ith layer sub-block output features, and combine the ith layer sub-block output features corresponding to the ith layer sub-block input feature groups to form an ith layer output feature block data; and
performing a temporary storing step to drive a block buffer bank to store the ith layer output feature block data and the ith layer reusing features;
wherein in response to determining that at least one of a plurality of input features of one of the ith layer sub-block input feature groups is located in an outer region of the ith layer reusing input feature block data, the input features of the one of the ith layer sub-block input feature groups comprise a plurality of outer block features and a plurality of first inner block features, the outer block features represent the input features that have been calculated, and the first inner block features represent the input features that have not been calculated;
in response to determining that the input features of the one of the ith layer sub-block input feature groups are all located in an inner region of the ith layer reusing input feature block data, the input features of the one of the ith layer sub-block input feature groups only comprise a plurality of second inner block features, and the second inner block features represent the input features that have not been calculated;
the ith layer reusing input feature block data has the outer region and the inner region in sequence along the block scanning direction; and
the outer block features are stored in the block buffer bank, the block buffer bank has a temporary storage space, the temporary storage space is calculated according to a width of the ith layer recomputing input feature block data, the depth, a layer number, a channel number and the ith layer kernel size, the width of the ith layer recomputing input feature block data is represented as BWi, the depth is represented as D, the layer number is represented as i, the channel number is represented as Ci, the ith layer kernel size is kWi×kHi, and the temporary storage space is represented as LBS and described as follows:
LBS=Σi=1D(kHi−1)·BWi·Ci.