CPC G06F 7/5443 (2013.01) [G06F 9/30105 (2013.01); G06N 3/063 (2013.01)] | 18 Claims |
1. A method for performing a two dimensional (2D) convolution operation, the method comprising:
storing, by a processor, a convolution kernel in a first storage device of the processor, the convolution kernel having dimensions x by y, wherein x is a number of rows in the convolution kernel and y is a number of columns in the convolution kernel;
storing, by the processor, in a second storage device of the processor, a first subset of element values of an input feature map having dimensions n by m, wherein n is a number of rows in the input feature map and m is a number of columns in the input feature map;
performing a first simultaneous multiplication, by the processor, of each value of the first subset of element values of the input feature map with a first element value from among the x*y elements of the convolution kernel;
shifting, by the processor, the first subset of element values one register to the left in a plurality of registers of the second storage device;
based on the shifting, performing a second simultaneous multiplication, by the processor, of each value of a second subset of element values of the input feature map with a second element value from among the x*y elements of the convolution kernel, wherein the first subset of element values of the input feature map comprises values in first to p-th column of a first row of the input feature map, and the second subset of element values of the input feature map comprises values in second to (p+1)-th column of the first row of the input feature map, and wherein the second subset of element values of the input feature map further comprises at least one element value from the first subset of element values of the input feature map;
for each remaining value of the x*y elements of the convolution kernel, shifting, by the processor, the second subset of element values of the input feature map one register to the left in the plurality of registers of the second storage device, wherein the second subset of element values of the input feature map is then stored to the second storage device;
performing, by the processor, a simultaneous multiplication of the each remaining value with a corresponding subset of element values of the input feature map;
for each simultaneous multiplication, storing, by the processor, a result of the each simultaneous multiplication in at least one accumulator connected to the processor; and
outputting, by the processor, an output from the at least one accumulator as a first row of an output feature map (OFM),
the first element value from among the x*y elements of the convolution kernel in the first simultaneous multiplication being different from the second element value from among the x*y elements of the convolution kernel in the second simultaneous multiplication,
wherein the at least one accumulator comprises a first accumulator configured to add the result of the each simultaneous multiplication to an input value, and a second accumulator coupled to an output of the first accumulator and configured to add the result of the each simultaneous multiplication to an input value and output the first row of the OFM.
|