US 12,488,228 B2
Programmable in-memory computing accelerator for low-precision deep neural network inference
Jae-sun Seo, Tempe, AZ (US); Bo Zhang, New York, NY (US); Mingoo Seok, New York, NY (US); and Shihui Yin, Mesa, AZ (US)
Assigned to ARIZONA BOARD OF REGENTS ON BEHALF OF ARIZONA STATE UNIVERSITY, Scottsdale, AZ (US); and The Trustees of Columbia University in the City of New York, New York, NY (US)
Filed by Arizona Board of Regents on behalf of Arizona State University, Scottsdale, AZ (US); and The Trustees of Columbia University in the City of New York, New York, NY (US)
Filed on Apr. 4, 2022, as Appl. No. 17/712,938.
Claims priority of provisional application 63/170,432, filed on Apr. 2, 2021.
Prior Publication US 2022/0318610 A1, Oct. 6, 2022
Int. Cl. G06F 15/80 (2006.01); G06F 7/544 (2006.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06F 15/78 (2006.01); G06N 3/04 (2023.01); G06N 3/063 (2023.01)
CPC G06N 3/063 (2013.01) [G06F 7/544 (2013.01); G06F 9/3004 (2013.01); G06F 9/3877 (2013.01); G06F 9/3885 (2013.01); G06F 9/3887 (2013.01); G06F 15/7867 (2013.01); G06N 3/04 (2013.01)] 9 Claims
OG exemplary drawing
 
1. A programmable large-scale hardware accelerator, comprising:
an activation memory;
a single-instruction-multiple-data (SIMD) processor; and
a plurality of in-memory computing (IMC) processing elements (PEs) capable of performing deep neural network inference operations at least one of serially or in parallel, each IMC PE comprising a set of IMC macros in a two-dimensional arrangement configured to run in parallel,
the accelerator being configured to execute an instruction, the executing comprising:
executing a first portion of the instruction by at least one IMC PE of the plurality of IMC PEs, the first portion comprising a multiplication-and-add (MAC) operation; and
executing a second portion of the instruction by the SIMD processor, the second portion comprising a non-MAC operation,
wherein:
the IMC PE is capable of supporting a plurality of convolution kernel sizes;
the IMC PE comprises an adder for accumulating results from the set of MC macros;
an IMC macro of the set of IMC macros of the IMC PE comprises a plurality of static-random-access-memory (SRAM) bitcells arranged in a plurality of columns;
the IMC macro comprises a signal line, the signal line being:
capacitively coupled to bitcells in a column of the plurality of columns, and
configured to accumulate a MAC result of bitcells in the column;
the IMC macro includes a flash analog-to-digital converter to convert the accumulated MAC result to a multi-bit digital value; and
the instruction comprises:
a read address,
a write address,
an IMC PE selection,
an IMC macro selection,
one or more SIMD operands, and
a SIMD operation code.