| CPC G06F 9/5027 (2013.01) [G06F 1/0307 (2013.01)] | 20 Claims |

|
1. A computer-implemented method for accelerating inference of a neural network model, including:
acquiring image training data, text training data, or speech training data;
determining a first neural network model to be accelerated;
converting a preset operation on a preset network layer in the first neural network model to a first operation for simulating operational logic of a target operation to obtain a second neural network model;
performing, based on the image training data, the text training data, or the speech training data, quantization aware training on the second neural network model by a preset bit width to obtain a third neural network model which is quantized, wherein the preset bit width is a bit width supported by a neural network accelerator for data processing; and
converting the first operation of the third neural network model to the target operation, to obtain a target neural network model, which is accelerated, corresponding to the first neural network model.
|