CPC G06N 3/08 (2013.01) [G06F 12/0207 (2013.01); G06F 17/153 (2013.01); G06F 17/16 (2013.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 20/10 (2019.01); H03M 7/30 (2013.01)] | 22 Claims |
1. A method for optimizing a neural network that includes a convolution layer, implemented by a device comprising at least one processor and memory, the method comprising:
allocating, by the at least one processor, from the memory, a memory space;
receiving, by the at least one processor, input data of the convolution layer;
compressing, by the at least one processor and along a channel direction of the convolution layer, the input data into compressed input data of the convolution layer;
compressing, by the at least one processor and along the channel direction of the convolution layer, convolution kernels of the convolution layer into a compressed convolution kernel;
dividing, by the at least one processor, the compressed input data of the convolution layer into blocks of the compressed input data of a same size as one of the compressed convolution kernels;
generating, by the at least one processor, output data of the convolution layer by performing convolution computation on the blocks of the compressed input data and the compressed convolution kernels; and
storing the output data of the convolution layer into the memory space, wherein
the output data of the convolution layer includes multiple elements each of which corresponds to one of the blocks of the compressed input data,
each of the multiple elements is obtained by performing the convolution computation on one of the blocks of the compressed input data using the compressed convolution kernels, and
the storing the output data of the convolution layer comprises storing respectively the multiple elements of the output data of the convolution layer into the memory space.
|