US 12,217,184 B2
Low-power, high-performance artificial neural network training accelerator and acceleration method
Hoi Jun Yoo, Daejeon (KR); and Sang Yeob Kim, Daejeon (KR)
Assigned to Korea Advanced Institute of Science and Technology, Daejeon (KR)
Filed by Korea Advanced Institute of Science and Technology, Daejeon (KR)
Filed on May 12, 2021, as Appl. No. 17/317,900.
Claims priority of application No. 10-2021-0003403 (KR), filed on Jan. 11, 2021.
Prior Publication US 2022/0222533 A1, Jul. 14, 2022
Int. Cl. G06N 3/082 (2023.01); G06F 9/50 (2006.01); G06F 15/80 (2006.01); G06N 3/063 (2023.01)
CPC G06N 3/082 (2013.01) [G06F 9/5027 (2013.01); G06F 15/80 (2013.01); G06N 3/063 (2013.01)] 7 Claims
OG exemplary drawing
 
1. A method of accelerating training of a low-power, high-performance artificial neural network (ANN), the method comprising:
(a) performing fine-grained pruning and coarse-grained pruning to generate sparsity in weights by a pruning unit in a convolution core of a cluster in a lower-power, high-performance ANN trainer, wherein
the fine-grained pruning generates a random sparsity pattern by replacing values with small magnitudes with zeros, and
the coarse-grained pruning calculates similarities between weights or magnitudes of the weights on an output channel basis and replaces similar consecutive weights or consecutive weights with the small magnitudes with consecutive zeros;
(b) selecting and performing dual zero skipping according to input sparsity, output sparsity, and the sparsity of weights by the convolution core,
wherein, when the convolution core performs the dual zero skipping by using the sparsity of weights, the convolution core skips zeroes in weight data by
skipping computations using the consecutive zeros caused by the coarse-grained pruning at once, and
skipping computations using random zeroes caused by the fine-grained pruning one at a time; and
(c) restricting access to a weight memory during training by allowing a deep neural network (DNN) computation core and a weight pruning core to share weights retrieved from a memory by the convolution core.