CPC G06N 3/063 (2013.01) [G06F 9/3814 (2013.01); G06F 9/3877 (2013.01); G06F 9/4498 (2018.02); G06F 9/5027 (2013.01); G06N 5/04 (2013.01)] | 18 Claims |
1. A neural network accelerator, comprising:
a memory;
a plurality of processing engines coupled together and configured to perform arithmetic operations in support of an inference performed using the neural network accelerator, wherein the plurality of processing engines are implemented using a plurality of processing elements; and
a schedule-aware tensor data distribution circuitry configured to:
load tensor data into the plurality of processing engines in a load phase;
extract output data from the plurality of processing engines in an extraction phase, wherein extracting the output data from the plurality of processing engines may be performed in a row-wise or column-wise organization;
reorganize the extracted output data based at least in part on a schedule for a next layer after a current layer to output the extracted output data by reshaping the extracted output data for storage in the memory to reduce a number of accesses to the memory for the next layer by changing a shape of how the extracted output data is stored in the memory based at least in part on a type of the next layer, wherein changing the shape comprises reorganizing the extracted output data to a column-wise organization from the row-wise organization or to the row-wise organization from the column-wise organization based on a specification of the next layer; and
store the reorganized extracted output data to the memory.
|