US 11,755,903 B2
Systems and methods for providing block-wise sparsity in a neural network
Maohua Zhu, San Mateo, CA (US); Zhenyu Gu, San Mateo, CA (US); and Yuan Xie, San Mateo, CA (US)
Assigned to Alibaba Group Holding Limited, Grand Cayman (KY)
Filed by ALIBABA GROUP HOLDING LIMITED, Grand Cayman (KY)
Filed on Jul. 24, 2019, as Appl. No. 16/521,564.
Prior Publication US 2021/0027156 A1, Jan. 28, 2021
Int. Cl. G06N 3/08 (2023.01)
CPC G06N 3/08 (2013.01) 19 Claims
OG exemplary drawing
 
1. A system for providing block-wise sparsity in a neural network, comprising:
at least one memory storing instructions; and
at least one processor configured to execute the instructions to cause the system to perform:
dividing a matrix of weights associated with a neural network into a plurality of blocks;
extracting non-zero elements from each of the plurality of blocks;
re-encoding the extracted non-zero elements from each block as a vector with an associated offset matrix of coordinates of the extracted non-zero elements from the block;
enforcing input sparsity in the neural network corresponding to the offset matrix, wherein enforcing input sparsity includes fetching elements of an input matrix corresponding to the coordinates in the offset matrix from an off-chip memory to the at least one memory; and
executing the neural network using the vectors and the enforced input sparsity.