US 12,412,081 B2
Method for permuting dimensions of a multi-dimensional tensor
Anton Bondarenko, Lund (SE); Anil Kumar Metla, Lund (SE); Robert David Hughes, Cambridge (GB); and Joshua Mark Jian Li Slater, Cambridge (GB)
Assigned to Arm Limited, Cambridge (GB)
Filed by Arm Limited, Cambridge (GB)
Filed on Oct. 26, 2020, as Appl. No. 17/080,302.
Prior Publication US 2022/0129744 A1, Apr. 28, 2022
Int. Cl. G06F 15/80 (2006.01); G06F 7/76 (2006.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01)
CPC G06N 3/08 (2013.01) [G06F 7/76 (2013.01); G06F 15/8092 (2013.01); G06N 3/063 (2013.01)] 9 Claims
OG exemplary drawing
 
1. A method performed by a processor comprising a plurality of compute engines each of which comprises a programmable engine and a multiply-accumulate engine, each programmable engine operating up to a maximum number of tensor values in a cycle, each programmable engine configured to operate on a slice of data including data from one channel of a multi-dimensional tensor at a time such that channels of the multi-dimensional tensor are parallelized across multiple compute engines, wherein each multiply-accumulate engine is configured to perform matrix multiply operations on received data to generate output data, and a respective programmable engine is configured to receive the output data from the multiply-accumulate engine in order to perform a permutation function on the output data, wherein the method-permutes dimensions of the multi-dimensional tensor, which contains an array of tensor values in three or more dimensions that are stored in a first storage unit, the method comprising:
transferring the array of tensor values from the first storage unit to a second storage unit by reading tensor values from the first storage unit that are arrayed along a first dimension of the multi-dimensional tensor and writing the corresponding tensor values to the second storage unit in locations corresponding to a second dimension of the multi-dimensional tensor that is different from the first dimension thereby reordering the tensor values; and
a plurality of programmable engines of the plurality of compute engines permuting a pair of dimensions of the multi-dimensional tensor in parallel by sequentially:
reading sub-blocks of the multi-dimensional tensor from a local storage,
permuting the pair of dimensions of the sub-blocks of the multi-dimensional tensor and
writing the permuted sub-blocks to the local storage of the processor, wherein the sub-blocks are read from and written to the local storage using addresses in the local storage so as to re-order the sub-blocks to complete the permutation of the pair of dimensions across the multi-dimensional tensor, wherein the local storage is one of the first storage unit and the second storage unit.