| CPC G06N 3/0464 (2023.01) [G06T 1/20 (2013.01)] | 20 Claims |

|
1. A method of operating a neural processing unit having a systolic array structure, the method comprising:
determining, by a controller, that an operation performed in a first convolution layer is a transpose convolution operation;
dividing, by the controller, a kernel used for the transpose convolution operation into a plurality of sub-kernels; and
performing a convolution operation between an input feature map and each of the plurality of sub-kernels in the first convolution layer, the convolution operation performed by each of a plurality of processing elements,
wherein each of the plurality of processing elements is configured to perform a process of reusing at least one of an output feature map, each of the plurality of sub-kernels, and the input feature map, which are values stored in a local memory of each of the plurality of processing elements, and
wherein the reusing process is performed by a first processing element of the plurality of processing elements transferring the stored values of the at least one of the output feature map, each of the plurality of sub-kernels, and the output feature map to a second processing element of the plurality of processing elements,
wherein the systolic array structure includes a plurality of structures arranged, in parallel, in correspondence to the values stored in the local memory, the stored values being used in successive convolution operations,
wherein a multiply-and-accumulate (MAC) operation mode of the plurality of processing elements corresponds to one of the plurality of structures and is switched based on a calculated MAC operation time, and
wherein the MAC operation mode includes an output stationary mode where the output feature map is reused, a weight stationary mode where each of the plurality of sub-kernels is reused, and an input stationary mode where the input feature map is reused.
|