CPC G06N 3/04 (2013.01) [G06F 8/41 (2013.01); G06F 17/15 (2013.01); G06N 3/063 (2013.01)] | 20 Claims |
1. An apparatus for running a convolutional layer of a neural network, comprising:
memory units configured to store an input tensor of the convolutional layer of the neural network, the input tensor divided into a plurality of tiles, different memory units configured to store different tiles in the input tensor of the convolutional layer;
convolution engines configured to operate in parallel to run the convolutional layer, a convolution engine configured to perform a plurality of multiply-accumulate (MAC) operations on a tile in the input tensor of the convolutional layer, different convolution engines configured to perform MAC operations on different tiles in the input tensor of the convolutional layer in parallel; and
a controller configured to control, based on a first iterator and a second iterator, iterations of the convolutional layer over the plurality of tiles, an iteration of the convolutional layer comprising the plurality of MAC operations performed by the convolution engine on the tile in the input tensor of the convolutional layer, different iterations conducted in parallel by different ones of the convolution engines over different ones of the plurality of tiles, the first iterator configured to count tiles along a first dimension of the input tensor that have been processed by the convolution engines, the second iterator configured to count tiles along a second dimension of the input tensor that have been processed by the convolution engines.
|