| CPC G06N 3/063 (2013.01) | 17 Claims |

|
1. A core of neural processing units (NPUs), comprising:
an N×N array of NPUs arranged in N rows and N columns in which N is an integer between 2 and 64 inclusive, each NPU comprising a memory, and a convolutional multiply-accumulate (MAC) circuit coupled to the memory, the memory capable of receiving, storing and outputting input feature map (IFM) values, kernel values and output feature map (OFM) values,
the N×N array of NPUs being configured to process IFM data by:
storing IFM values of an array of IFM values so that each respective row of IFM values of the array of IFM values is sequentially stored in the respective memory of NPUs located along diagonals of the N×N array of NPUs;
broadcasting an IFM value stored in the memory in each of the NPUs located along the diagonals of the N×N array of NPUs to memory of other NPUs located in a same row as the respective NPUs;
for each row of the N×N array of NPUs, multiplying an IFM value broadcast to the memory of an NPU in the row by a kernel value stored in the memory of each respective NPU in the row to form a product value PV for the NPU;
for each column of the N×N array of NPUs, adding all product values PV in a column to form an OFM value for the column;
storing each respective OFM value in the memory in a NPU located along the diagonals of the N×N array of NPUs; and
repeating broadcasting, multiplying, adding and storing until all diagonals of the N×N array of NPUs have been processed.
|