US 11,726,950 B2
Compute near memory convolution accelerator
Huseyin Ekin Sumbul, Portland, OR (US); Gregory K. Chen, Portland, OR (US); Phil Knag, Hillsboro, OR (US); Raghavan Kumar, Hillsboro, OR (US); and Ram Krishnamurthy, Portland, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Sep. 28, 2019, as Appl. No. 16/586,975.
Prior Publication US 2020/0034148 A1, Jan. 30, 2020
Int. Cl. G06F 15/80 (2006.01); G06F 17/15 (2006.01); G06N 3/063 (2023.01)
CPC G06F 15/8046 (2013.01) [G06F 17/153 (2013.01); G06N 3/063 (2013.01)] 8 Claims
OG exemplary drawing
 
1. An integrated circuit comprising:
a memory to store one or more channels of a same filter row of a filter, each channel of the same filter row to be stored contiguously in the memory, row by row, in a channel-wise order;
an input buffer to receive one or more channels of input row vectors of input activations streamed to the input buffer, row by row, in the channel-wise order;
circuitry, including:
a multiplexer circuit to select a selected weight from a stored filter row, and
a multiplexer array to access the input activations from the input buffer based on a stride input and a weight position of the selected weight; and
at least one array of multiply and accumulate (MAC) units coupled to the circuitry, the at least one array of MAC units to compute, from the selected weight and the input activations, a partial sum for a convolution; and
wherein the circuitry enables access to the memory and the input buffer by the at least one array of MAC units to accelerate the convolution.