US 11,941,111 B2
Exploiting fine-grained structured weight sparsity in systolic arrays
Sanchari Sen, Indianapolis, IN (US); Swagath Venkataramani, Yorktown Heights, NY (US); Vijayalakshmi Srinivasan, Yorktown Heights, NY (US); Kailash Gopalakrishnan, Yorktown Heights, NY (US); and Sunil K. Shukla, Yorktown Heights, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Jul. 31, 2021, as Appl. No. 17/444,187.
Prior Publication US 2023/0030287 A1, Feb. 2, 2023
Int. Cl. G06F 21/00 (2013.01); G06F 7/544 (2006.01); G06F 21/55 (2013.01); G06N 3/04 (2023.01)
CPC G06F 21/55 (2013.01) [G06F 7/5443 (2013.01); G06N 3/04 (2013.01); G06F 2221/034 (2013.01)] 25 Claims
OG exemplary drawing
 
1. A method for exploiting fine-grained structured weight sparsity in deep neural networks in a computing environment, by one or more processors, comprising:
storing indices of a plurality of non-zero weights in an index register file included within each of a plurality of processor elements in a systolic array;
storing the plurality of non-zero weights in a register file associated with the index register file, wherein only values of the non-zero weights are stored in all memory levels associated with the plurality of processor elements in the systolic array;
sending, to one or more of the plurality of processor elements, a plurality of input values corresponding to a single block in a data structure; and
selecting one or more of the plurality of input values corresponding to the indices of the plurality of non-zero weights in the index register file for performing multiply-accumulate (“MAC”) operation based on sending, to the one or more of the plurality of processor elements, the plurality of input values.