CPC G06F 11/1476 (2013.01) [G06N 3/047 (2023.01); G06N 3/08 (2013.01); G06F 2201/81 (2013.01); G06F 2201/865 (2013.01)] | 15 Claims |
1. A method for adjusting a data bit width in a convolution neural network layer during a neural network computation, comprising:
obtaining a data bit width used to perform a quantization on data to be quantized, wherein the data to be quantized includes at least one type of neurons, weights, gradients, or biases, the data bit width indicates the data bit width of the quantized data after the data to be quantized being quantized;
performing a quantization on a group of data to be quantized based on the data bit width to convert the group of data to be quantized to a group of quantized data, wherein the group of quantized data has the data bit width;
comparing the group of data to be quantized with the group of quantized data to determine a quantization error correlated with the data bit width;
adjusting the data bit width based on the determined quantization error; and
applying the adjusted data bit width during quantization in the convolution neural network layer.
|