US 12,468,921 B2
	Pipelining and parallelizing graph execution method for neural network model computation and apparatus thereof
Hongsheng Wang, Hangzhou (CN); Bowen Tan, Hangzhou (CN); Hujun Bao, Hangzhou (CN); and Guang Chen, Hangzhou (CN)
Assigned to ZHEJIANG LAB, Hangzhou (CN)
Filed by ZHEJIANG LAB, Hangzhou (CN)
Filed on Jun. 13, 2022, as Appl. No. 17/838,342.
Application 17/838,342 is a continuation of application No. PCT/CN2022/092481, filed on May 12, 2022.
Claims priority of application No. 202210447287.7 (CN), filed on Apr. 27, 2022.
Prior Publication US 2023/0351145 A1, Nov. 2, 2023
Int. Cl. G06N 3/04 (2023.01); G06F 9/38 (2018.01); G06F 16/901 (2019.01)

CPC G06N 3/04 (2013.01) [G06F 9/3885 (2013.01); G06F 16/9024 (2019.01)]

8 Claims

1. A pipelining and parallelizing graph execution method for neural network model computation, wherein several executives are provided in a neural network model; a total of 2*N executives are provided, and N is a positive integer; several memory blocks are provided in the executive; the method specifically comprises the following steps:

S1, dividing training data into several batches of subdata;

S2, inputting the several batches of subdata into the neural network model in sequence; executing, by an nth executive, self-kernel function computation on an ith batch of subdata after the ith batch of subdata is input, and writing an execution result into an idle memory block of the nth executive; then inputting an (i+1)th batch of subdata, wherein i and n are both positive integers;

S3, executing, by the nth executive, the operation in S2 on the (i+1)th batch of subdata, and sending an address of the memory block where the ith batch is located to an (n+1)th executive after the (i+1)th batch of subdata is input; parsing, by the (n+1)th executive, the memory block where the ith batch is located to obtain an execution result of the nth executive on the ith batch of subdata, executing the self-kernel function computation by taking the execution result of the nth executive as input data of the (n+1)th executive, and writing the execution result into an idle memory block of the (n+1)th executive; then inputting an (i+2)th batch of subdata;

S4, executing, by the nth executive, the operation in S2 on the (i+2)th batch of subdata, and executing, by the nth executive and the (n+1)th executive, the operation in S3 on the (i+1)th batch of subdata after the (i+2)th batch of subdata is input; at the same time, sending, by the (n+1)th executive, the address of the memory block where the ith batch is located to an (n+2)th executive; parsing, by the (n+2)th executive, the memory block where the ith batch is located to obtain an execution result of the (n+1)th executive on the ith batch of subdata, executing the self-kernel function computation by taking the execution result of the (n+1)th executive as input data of the (n+2)th executive, and writing the execution result into an idle memory block of the (n+2)th executive;

S5, reclaiming, by the nth executive, the memory block sent to the (n+1)th executive; and

S6, executing, by the last executive, the self-kernel function computation; writing the execution result to a memory block of the last executive; and reclaiming the memory block on its own immediately at the end of the execution.