US 12,141,513 B2
	Method to map convolutional layers of deep neural network on a plurality of processing elements with SIMD execution units, private memories, and connected as a 2D systolic processor array
Chia-Yu Chen, Yorktown Heights, NY (US); Jungwook Choi, Chappaqua, NY (US); Kailash Gopalakrishnan, San Jose, CA (US); Vijayalakshmi Srinivasan, New York, NY (US); Swagath Venkataramani, Yonkers, NY (US); and Jintao Zhang, Princeton, NJ (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Oct. 31, 2018, as Appl. No. 16/177,017.
Prior Publication US 2020/0134105 A1, Apr. 30, 2020
Int. Cl. G06F 30/3323 (2020.01); G06F 111/04 (2020.01); G06N 3/04 (2023.01); G06N 3/063 (2023.01)

CPC G06F 30/3323 (2020.01) [G06N 3/04 (2013.01); G06N 3/063 (2013.01); G06F 2111/04 (2020.01)]

18 Claims

1. A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device, the method comprising:

inputting parameters as input data into a processor configured to, on a computer, formalize a design space exploration of a convolution mapping on a predefined DNN computer architecture that will execute the predefined DNN convolution processing, wherein the parameters are predefined as guided by a specification for the predefined DNN convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined DNN convolution processing;

calculating, by the processor, performance metrics for executing the predefined DNN convolution processing on a two-dimensional systolic processor, as functions of the parameters, as proxy estimates of performance of different possible design choices to implement the predefined DNN convolution processing for output, wherein the calculating, by the processor, of the performance metrics for executing the predefined DNN convolution processing is to prune invalid mapping options having calculated performance metrics that are less than minimum expected performance metrics, and architecture configurations to achieve desired performance goals, including low energy and high throughput;

determining an optimal convolution mapping onto a three-dimensional (3D) processor array for the predefined DNN convolution processing from the calculating, wherein the optimal convolution mapping includes calculated performance metrics that are greater than maximum expected performance metrics; and

performing the predefined convolution processing onto a plurality of processing elements connected as the three-dimensional processor array,

wherein three data arrays in the predefined DNN convolution processing includes input, kernel, output, such that another set is defined with three dimensions.