US 12,271,820 B2
	Neural network acceleration and neural network acceleration method based on structured pruning and low-bit quantization
Kejie Huang, Zhejiang (CN); Chaoyang Zhu, Zhejiang (CN); and Haibin Shen, Zhejiang (CN)
Assigned to ZHEJIANG UNIVERSITY, Hangzhou (CN)
Filed by ZHEJIANG UNIVERSITY, Zhejiang (CN)
Filed on Sep. 27, 2021, as Appl. No. 17/485,645.
Application 17/485,645 is a continuation of application No. PCT/CN2020/099891, filed on Jul. 2, 2020.
Claims priority of application No. 201910609993.5 (CN), filed on Jul. 8, 2019.
Prior Publication US 2022/0012593 A1, Jan. 13, 2022
Int. Cl. G06N 3/082 (2023.01); G06F 9/50 (2006.01); G06N 3/063 (2023.01)

CPC G06N 3/082 (2013.01) [G06F 9/5027 (2013.01); G06N 3/063 (2013.01)]

4 Claims

2. A neural network accelerator based on structured pruning and low-bit quantization, comprising:

a master controller;

an activations selection unit;

an extensible calculation array;

a multifunctional processing element;

a Direct Memory Access (DMA);

a Dynamic Random Access Memory (DRAM); and

a buffer,

wherein the master controller is respectively connected with the activations selection unit, the extensible calculation array and the DMA; the DMA is respectively connected with the buffer and the DRAM; the buffer is respectively connected with the multifunctional processing element and the activations selection unit; and the extensible calculation array is respectively connected with the activations selection unit and the buffer;

the master controller is configured for parsing an instruction set to generate a first storage address of input activation and weights, a storage address of output activation and control signals;

the buffer is configured for storing the input activation, the output activation and weight indexes;

the activations selection unit is configured for selecting the input activation inputted from the buffer according to the control signals generated by the master controller and transmitting the input activation to the extensible calculation array;

the extensible calculation array comprises N×M PEs; N and M represent rows and columns of the PEs respectively; each PE is configured to store part of weights of the neural network, determine a second storage address of the weights according to the received weight indexes, acquire the weights corresponding to the input activation according to the second storage address of the weights, and control the reading of the weights and the on-off state of a multiplier in the PE by judging whether the received input activation are zero; each PE is configured to judge whether the currently calculated output activations completes the convolution of the input activations and the weight of an input channel according to the control signals generated by the master controller, and if so, the PE stores the output activations into an output activations buffer in the buffer through the activations selection unit;

the multifunctional processing element is configured for completing pooling, activation and normalization operations of the network; and

the DMA is configured for reading the weights stored in the DRAM according to the first storage address of the weights, reading the output activation stored in the buffer according to the storage address of the current output activations, and transmitting the output activation to the DRAM for storage;

wherein the activations selection unit comprises:

an input activations register;

an index decoder; and

a selector,

wherein the selector is respectively connected with the input activations register and the index decoder;

the input activations register is configured for reading in and outputting the input activation according to the control signals generated by the master controller;

the index decoder is configured for decoding weight indexes to generate jump value; and

the selector is configured for selecting the input activation according to the jump value and transmitting the input activation to the extensible calculation array.