US 12,423,566 B2
SRAM-sharing for reconfigurable neural processing units
Jong Hoon Shin, San Jose, CA (US); Ali Shafiee Ardestani, Santa Clara, CA (US); and Joseph H. Hassoun, Los Gatos, CA (US)
Assigned to SAMSUNG ELECTRONICS CO., LTD., (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Aug. 11, 2021, as Appl. No. 17/400,094.
Claims priority of provisional application 63/209,388, filed on Jun. 10, 2021.
Prior Publication US 2022/0405557 A1, Dec. 22, 2022
Int. Cl. G06N 3/063 (2023.01)
CPC G06N 3/063 (2013.01) 17 Claims
OG exemplary drawing
 
1. A core of neural processing units (NPUs), comprising:
an N×N array of NPUs arranged in N rows and N columns in which N is an integer between 2 and 64 inclusive, each NPU comprising a memory, and a convolutional multiply-accumulate (MAC) circuit coupled to the memory, the memory capable of receiving, storing and outputting input feature map (IFM) values, kernel values and output feature map (OFM) values,
the N×N array of NPUs being configured to process IFM data by:
storing IFM values of an array of IFM values so that each respective row of IFM values of the array of IFM values is sequentially stored in the respective memory of NPUs located along diagonals of the N×N array of NPUs;
broadcasting an IFM value stored in the memory in each of the NPUs located along the diagonals of the N×N array of NPUs to memory of other NPUs located in a same row as the respective NPUs;
for each row of the N×N array of NPUs, multiplying an IFM value broadcast to the memory of an NPU in the row by a kernel value stored in the memory of each respective NPU in the row to form a product value PV for the NPU;
for each column of the N×N array of NPUs, adding all product values PV in a column to form an OFM value for the column;
storing each respective OFM value in the memory in a NPU located along the diagonals of the N×N array of NPUs; and
repeating broadcasting, multiplying, adding and storing until all diagonals of the N×N array of NPUs have been processed.