| CPC G06N 3/04 (2013.01) | 14 Claims |

|
1. A method comprising:
receiving first data from a memory;
providing, by a first data feeder, a portion of the first data to a vector multiply unit in vector format by providing a number of feature planes K to the vector multiply unit, the feature planes including at least a first feature plane and a second feature plane;
providing, by a second data feeder, second data including a first set of weight values to the vector multiply unit by at least providing a corresponding weight from each convolution filter of a set of convolution filters to the vector multiply unit in vector format, the first set of weight values including multiple sets of weight da each set of weight data including L weight coefficients;
multiplying and accumulating the first data and the second data as a partial product in at least one node of the vector multiply unit, such that a product is accumulated in the at least one node of the vector multiply unit after a plurality of iterations by at least:
multiplying the first set of weight values by each pixel value in a first subset of pixel positions for the first feature plane to generate a first result;
storing the first result in a first accumulator of the node;
multiplying the first set of weight values by each pixel value in the first subset of pixel positions for the second feature plane to generate a second result; and
storing the second result in a second accumulator of the node; and
rearranging, using the first data feeder and the second data feeder, the portion of the first data and the second data so that partial products computed by the vector multiply unit for one output feature value are directed to one multiplier and one accumulator within one node of the vector multiply unit;
reading the first and second results from the node after a delay of at least K×L cycles after a previous iteration of the reading step; and
storing the first and second results in the memory.
|