CPC G06N 3/04 (2013.01) | 16 Claims |
1. A method of executing a neural network (NN), the method comprising:
for each of a plurality of sub-tensor columns, each sub-tensor column comprising computations from portions of a plurality of layers of the NN:
executing the sub-tensor column so that data usage of the sub-tensor column during execution is entirely within a first layer of cache;
the sub-tensor column saving its output entirely within a second layer of cache, the first layer of cache being lower than the second layer of cache; and
executing a second plurality of sub-tensor columns, each of the second plurality of sub-tensor columns receiving its input from the second layer of cache;
wherein a first plurality of sub-tensor columns forms a first tensor column, and a second plurality of sub-tensor columns forms a second tensor column, an output from the first tensor column and an output from the second tensor column not overlapping.
|