| CPC G06N 3/063 (2013.01) [G06F 7/544 (2013.01); G06F 9/3004 (2013.01); G06F 9/3877 (2013.01); G06F 9/3885 (2013.01); G06F 9/3887 (2013.01); G06F 15/7867 (2013.01); G06N 3/04 (2013.01)] | 9 Claims |

|
1. A programmable large-scale hardware accelerator, comprising:
an activation memory;
a single-instruction-multiple-data (SIMD) processor; and
a plurality of in-memory computing (IMC) processing elements (PEs) capable of performing deep neural network inference operations at least one of serially or in parallel, each IMC PE comprising a set of IMC macros in a two-dimensional arrangement configured to run in parallel,
the accelerator being configured to execute an instruction, the executing comprising:
executing a first portion of the instruction by at least one IMC PE of the plurality of IMC PEs, the first portion comprising a multiplication-and-add (MAC) operation; and
executing a second portion of the instruction by the SIMD processor, the second portion comprising a non-MAC operation,
wherein:
the IMC PE is capable of supporting a plurality of convolution kernel sizes;
the IMC PE comprises an adder for accumulating results from the set of MC macros;
an IMC macro of the set of IMC macros of the IMC PE comprises a plurality of static-random-access-memory (SRAM) bitcells arranged in a plurality of columns;
the IMC macro comprises a signal line, the signal line being:
capacitively coupled to bitcells in a column of the plurality of columns, and
configured to accumulate a MAC result of bitcells in the column;
the IMC macro includes a flash analog-to-digital converter to convert the accumulated MAC result to a multi-bit digital value; and
the instruction comprises:
a read address,
a write address,
an IMC PE selection,
an IMC macro selection,
one or more SIMD operands, and
a SIMD operation code.
|