| CPC G06N 3/08 (2013.01) [G06F 13/28 (2013.01)] | 20 Claims |

|
1. A neural network processor comprising:
one or more direct memory access (DMA) engines;
a state buffer memory having a plurality of row partitions organized into row groups; and
a crossbar circuit having a transpose memory coupled to a plurality of output ports, each output port operable to write to a corresponding row group of the state buffer memory,
wherein the one or more DMA engines are operable to issue a set of write transactions to write a data array to the state buffer memory,
wherein the crossbar circuit is operable to:
determine that the set of write transactions is for a transpose write;
store write data for each write transaction diagonally in the transpose memory;
rotate each row of data in the transpose memory; and
provide rotated data from each row of the transpose memory to a corresponding output port to write to the state buffer memory, and
wherein each of the write transactions in the set of write transactions includes an opcode to provide flow control information to manage transpose credits used for accessing the transpose memory.
|