CPC G06F 11/1476 (2013.01) [G06N 3/047 (2023.01); G06N 3/08 (2013.01); G06F 2201/81 (2013.01); G06F 2201/865 (2013.01)] | 20 Claims |
1. A method for quantizing data in a neural network using neural network quantization parameters, comprising:
obtaining an analyzing result of each type of the data to be quantized, wherein the data includes at least one type of neurons, weights, gradients, or biases of the neural network;
determining a corresponding quantization parameter according to the analyzing result of each type of the data to be quantized and a data bit width corresponding to the data to be quantized;
quantizing the data using the corresponding quantization parameter to obtain quantized data;
performing inverse quantization on the quantized data to obtain inverse quantized data, wherein a data format of the inverse quantized data is the same as that of the corresponding pre-quantized data; and
determining a quantization error based on the quantized data and the inverse quantized data.
|