US 12,393,460 B2
	Method and apparatus for accelerating inference of neural network model, electronic device, and medium
Yushu Gao, Nanjing (CN); Shuqian Qu, Nanjing (CN); Wen Dai, Nanjing (CN); and Kaiwen Kong, Nanjing (CN)
Assigned to NANJING HORIZON ROBOTICS INTEGRATED CIRCUIT CO., LTD., Nanjing (CN)
Filed by NANJING HORIZON ROBOTICS INTEGRATED CIRCUIT CO., LTD., Nanjing (CN)
Filed on Sep. 14, 2023, as Appl. No. 18/467,349.
Claims priority of application No. 202211133732.9 (CN), filed on Sep. 16, 2022.
Prior Publication US 2024/0095078 A1, Mar. 21, 2024
Int. Cl. G06F 9/50 (2006.01); G06F 1/03 (2006.01)

CPC G06F 9/5027 (2013.01) [G06F 1/0307 (2013.01)]

20 Claims

1. A computer-implemented method for accelerating inference of a neural network model, including:

acquiring image training data, text training data, or speech training data;

determining a first neural network model to be accelerated;

converting a preset operation on a preset network layer in the first neural network model to a first operation for simulating operational logic of a target operation to obtain a second neural network model;

performing, based on the image training data, the text training data, or the speech training data, quantization aware training on the second neural network model by a preset bit width to obtain a third neural network model which is quantized, wherein the preset bit width is a bit width supported by a neural network accelerator for data processing; and

converting the first operation of the third neural network model to the target operation, to obtain a target neural network model, which is accelerated, corresponding to the first neural network model.