| CPC G06N 3/063 (2013.01) [G06F 17/16 (2013.01); G06N 3/08 (2013.01)] | 19 Claims |

|
1. A method of executing matrix multiply operations for a neural network (NN), the method comprising, using a computer processor comprising a plurality of cores and a shared cache shared among the cores, each core associated with a local cache used by that core only:
partitioning input data for a NN layer into a set of tiles using a parameter T, each tile being of size T×T;
using a parameter R, partitioning the set of tiles into blocks of R tiles each; and
for each block of R tiles, performing by a single core:
a transform operation on the R tiles to produce a set of transformed data matrices, the set of transformed data matrices stored in the local cache of the single core; and
a set of multiply operations, each multiply operation using a transformed data matrix of the set of transformed data matrices and a transformed kernel matrix from a set of transformed kernel matrices, the set of transformed kernel matrices stored in the shared cache, wherein a result of at least one of the multiply operations is stored in a location used to store a transformed data matrix.
|