| CPC G06N 3/063 (2013.01) [G06F 15/7825 (2013.01); G06F 17/16 (2013.01)] | 21 Claims |

|
1. An integrated chip for computing an output for a convolutional neural network, the integrated chip comprising:
a plurality of convolution layers, wherein a convolution layer in the plurality of convolution layers comprises a plurality of kernels and a kernel in the plurality of kernels comprises a respective matrix structure of weights;
for the convolution layer in the plurality of convolution layers, the integrated chip is configured to execute instructions that cause the integrated chip to perform intra-crossbar parallelization and inter-crossbar parallelization that compute simultaneously on input data points via a method comprising:
flattening the kernel in the plurality of kernels into vectors;
grouping the vectors into a vector matrix, where the vector matrix comprises a plurality of lines;
replicating and storing duplicates of the vector matrix according to a number and size on the convolution layer of the convolutional neural network and a crossbar size of a crossbar, wherein the duplicates are stored in unused space of the crossbar of the integrated chip, comprising a crossbar matrix; and
computing a first convolution of the convolution layer as a dot product of input activations vector and the crossbar matrix, wherein the first convolution of the convolution layer corresponds with smaller weights than a second convolution of the convolution layer to parallelize separate iterations in the convolution layer in alignment with the intra-crossbar parallelization, and wherein the duplicates are computed simultaneously on the input data points to perform the intra-crossbar parallelization and the inter-crossbar parallelization.
|