US 12,033,053 B1
System and method of determining and executing deep tensor columns in neural networks
Alexander Matveev, Cambridge, MA (US); Nir Shavit, Cambridge, MA (US); and Govind Ramnarayan, Somerville, MA (US)
Assigned to NEURALMAGIC, INC., Somerville, MA (US)
Filed by Neuralmagic Inc., Somerville, MA (US)
Filed on Nov. 23, 2022, as Appl. No. 17/993,463.
Application 17/993,463 is a continuation of application No. 17/547,458, filed on Dec. 10, 2021, granted, now 11,556,757.
Claims priority of provisional application 63/270,291, filed on Oct. 21, 2021.
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/04 (2023.01)
CPC G06N 3/04 (2013.01) 16 Claims
OG exemplary drawing
 
1. A method of executing a neural network (NN), the method comprising:
for each of a plurality of sub-tensor columns, each sub-tensor column comprising computations from portions of a plurality of layers of the NN:
executing the sub-tensor column so that data usage of the sub-tensor column during execution is entirely within a first layer of cache;
the sub-tensor column saving its output entirely within a second layer of cache, the first layer of cache being lower than the second layer of cache; and
executing a second plurality of sub-tensor columns, each of the second plurality of sub-tensor columns receiving its input from the second layer of cache;
wherein a first plurality of sub-tensor columns forms a first tensor column, and a second plurality of sub-tensor columns forms a second tensor column, an output from the first tensor column and an output from the second tensor column not overlapping.