US 12,423,580 B1
Crossbar based transpose data transfers
Patricio Kaplan, Palo Alto, CA (US); and Ron Diamant, San Jose, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Mar. 31, 2023, as Appl. No. 18/194,043.
Int. Cl. G06F 13/28 (2006.01); G06N 3/08 (2023.01)
CPC G06N 3/08 (2013.01) [G06F 13/28 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A neural network processor comprising:
one or more direct memory access (DMA) engines;
a state buffer memory having a plurality of row partitions organized into row groups; and
a crossbar circuit having a transpose memory coupled to a plurality of output ports, each output port operable to write to a corresponding row group of the state buffer memory,
wherein the one or more DMA engines are operable to issue a set of write transactions to write a data array to the state buffer memory,
wherein the crossbar circuit is operable to:
determine that the set of write transactions is for a transpose write;
store write data for each write transaction diagonally in the transpose memory;
rotate each row of data in the transpose memory; and
provide rotated data from each row of the transpose memory to a corresponding output port to write to the state buffer memory, and
wherein each of the write transactions in the set of write transactions includes an opcode to provide flow control information to manage transpose credits used for accessing the transpose memory.