CPC G06N 3/063 (2013.01) [G06N 3/08 (2013.01); G06N 5/046 (2013.01); G06N 20/00 (2019.01); G06T 1/20 (2013.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)] | 14 Claims |
1. A neural network model deployment method, applied to a terminal device, the method comprising:
reading an initial neural network model, wherein the initial neural network model comprises a convolutional neural network (CNN) model;
obtaining a layer definition of each network layer of the initial neural network model;
obtaining trained model parameters of each network layer of the initial neural network model;
executing, by using a layer class, a target network layer corresponding to each network layer in the terminal device separately according to the layer definition of each network layer, so that each target network layer is inherited from the layer class;
applying relational connections amongst the target network layers using a net class;
converting the trained model parameters of each network layer into a preset format different from their original format;
obtaining target model parameters of each network layer based on the preset format;
loading corresponding target model parameters in the target network layer corresponding to each network layer separately according to the target model parameters of each network layer;
obtaining a target neural network model for deployment in the terminal device based on the target model parameters of each network layer; and
deploying the target neural network model in the terminal device by:
identifying, in the target neural network model, a convolution layer for generation of OC channels of output feature maps from IC channels of input feature maps based on OC convolutional kernels, each of the input feature maps and output feature maps comprise W by H pixels, OC, IC, W, and H being positive integers, W and H being pixel width and pixel height of each of the input feature maps and output feature maps; and
deploying the convolutional layer to calculate each of the W*H pixels of each of the OC channels of output feature maps by performing convolution with a kernel corresponding to the each of the OC channels in a GPU, wherein the convolutions for pixels of the OC channels of output feature maps are computed in parallel between every at least 8 adjacent pixels in each of the OC channels of output feature map but consecutively for each pixel within the every at least 8 adjacent pixels in order to reduce an average number of memory reads needed for calculating each pixel of the output feature maps by at least a factor of 2 in comparison to calculating the convolutions of all pixels in each of OC output feature maps in parallel.
|