CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)] | 14 Claims |
1. A neural network quantization method, comprising:
obtaining a weight and input data of a target quantization layer of an original neural network, wherein the target quantization layer includes at least one computation layer of the original neural network;
determining a quantization parameter of a weight of a corresponding layer by using the weight of the target quantization layer of the original neural network; determining a quantization parameter of input data of a corresponding layer by using the input data of the target quantization layer of the original neural network, wherein both the weight and the input data of the target quantization layer follow a principle of not distorting a maximum absolute value; and
quantizing the target quantization layer of the original neural network according to the quantization parameter of the weight and the quantization parameter of the input data to generate a quantized weight and quantized input data.
|