US 11,966,835 B2
Deep neural network accelerator with fine-grained parallelism discovery
Ching-En Lee, Ann Arbor, MI (US); Yakun Shao, Santa Clara, CA (US); Angshuman Parashar, Northborough, MA (US); Joel Emer, Acton, MA (US); and Stephen W. Keckler, Austin, TX (US)
Assigned to NVIDIA CORP., Santa Clara, CA (US)
Filed by NVIDIA Corp., Santa Clara, CA (US)
Filed on Jan. 23, 2019, as Appl. No. 15/929,093.
Claims priority of provisional application 62/680,978, filed on Jun. 5, 2018.
Prior Publication US 2019/0370645 A1, Dec. 5, 2019
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A deep learning network accelerator comprising:
an encoder to compress an input activation vector and a weight vector to reduce sparsity therein, thereby generating a compressed input activation vector and a compressed weight vector;
a parallelism discovery unit to compare coordinate indexes for the compressed weight vector and for the compressed input activation vector to generate matching pairs of coordinate indexes;
a decoder to generate column selects and row selects from the matching pairs, the column selects and row selects comprising validity markers and addresses for the matching coordinate indexes; and
an array of computing elements to receive the column selects and the row selects from the decoder and to transform the column selects, the row selects, the compressed input activation vector, and the compressed weight vector into output activations of a deep learning network.