US 12,443,833 B2
	Systems and methods for neural network convolutional layer matrix multiplication using cache memory
Rati Gelashvili, Cambridge, MA (US)
Assigned to RED HAT, INC., Raleigh, NC (US)
Appl. No. 17/271,326
Filed by Neuralmagic Inc., Somerville, MA (US)
PCT Filed Aug. 27, 2019, PCT No. PCT/US2019/048236 § 371(c)(1), (2) Date Feb. 25, 2021, PCT Pub. No. WO2020/046859, PCT Pub. Date Mar. 5, 2020.
Claims priority of provisional application 62/723,350, filed on Aug. 27, 2018.
Prior Publication US 2021/0201124 A1, Jul. 1, 2021
Int. Cl. G06F 17/00 (2019.01); G06F 17/16 (2006.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01)

CPC G06N 3/063 (2013.01) [G06F 17/16 (2013.01); G06N 3/08 (2013.01)]

19 Claims

1. A method of executing matrix multiply operations for a neural network (NN), the method comprising, using a computer processor comprising a plurality of cores and a shared cache shared among the cores, each core associated with a local cache used by that core only:

partitioning input data for a NN layer into a set of tiles using a parameter T, each tile being of size T×T;

using a parameter R, partitioning the set of tiles into blocks of R tiles each; and

for each block of R tiles, performing by a single core:

a transform operation on the R tiles to produce a set of transformed data matrices, the set of transformed data matrices stored in the local cache of the single core; and

a set of multiply operations, each multiply operation using a transformed data matrix of the set of transformed data matrices and a transformed kernel matrix from a set of transformed kernel matrices, the set of transformed kernel matrices stored in the shared cache, wherein a result of at least one of the multiply operations is stored in a location used to store a transformed data matrix.