CPC G06F 21/55 (2013.01) [G06F 7/5443 (2013.01); G06N 3/04 (2013.01); G06F 2221/034 (2013.01)] | 25 Claims |
1. A method for exploiting fine-grained structured weight sparsity in deep neural networks in a computing environment, by one or more processors, comprising:
storing indices of a plurality of non-zero weights in an index register file included within each of a plurality of processor elements in a systolic array;
storing the plurality of non-zero weights in a register file associated with the index register file, wherein only values of the non-zero weights are stored in all memory levels associated with the plurality of processor elements in the systolic array;
sending, to one or more of the plurality of processor elements, a plurality of input values corresponding to a single block in a data structure; and
selecting one or more of the plurality of input values corresponding to the indices of the plurality of non-zero weights in the index register file for performing multiply-accumulate (“MAC”) operation based on sending, to the one or more of the plurality of processor elements, the plurality of input values.
|