US 11,676,029 B2
Neural network quantization parameter determination method and related products
Shaoli Liu, Shanghai (CN); Xiaofu Meng, Shanghai (CN); Xishan Zhang, Shanghai (CN); and Jiaming Guo, Shanghai (CN)
Assigned to SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD, Shanghai (CN)
Filed by SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD, Shanghai (CN)
Filed on Dec. 19, 2019, as Appl. No. 16/720,113.
Application 16/720,113 is a continuation of application No. 16/622,541, previously published as PCT/CN2019/106754, filed on Sep. 19, 2019.
Claims priority of application No. 201910505239.7 (CN), filed on Jun. 12, 2019; application No. 201910515355.7 (CN), filed on Jun. 14, 2019; application No. 201910528537.8 (CN), filed on Jun. 18, 2019; and application No. 201910570125.0 (CN), filed on Jun. 27, 2019.
Prior Publication US 2020/0394523 A1, Dec. 17, 2020
Int. Cl. G06N 3/084 (2023.01); G06F 17/18 (2006.01); G06N 20/00 (2019.01)
CPC G06N 3/084 (2013.01) [G06F 17/18 (2013.01); G06N 20/00 (2019.01)] 22 Claims
OG exemplary drawing
 
1. A method for determining quantization parameters in a neural network, comprising:
obtaining an analyzing result of each type of data to be quantized, wherein the data to be quantized includes at least one type of data among neurons, weights, gradients, or biases of the neural network; and
determining a corresponding quantization parameter according to the analyzing result of each type of the data to be quantized and a data bit width corresponding to the data to be quantized, wherein the quantization parameter is used by an artificial intelligence processor to perform corresponding quantization on data involved in a process of neural network operation; wherein the quantization parameter is a first scaling coefficient and quantizing target data by using the corresponding quantization parameter, wherein a feature of the target data is similar to that of the data to be quantized,
wherein the analyzing result is a maximum value and a minimum value of, or a maximum absolute value of, each type of data to be quantized,
wherein the maximum absolute value is determined according to the maximum value and the minimum value of each type of data to be quantized,
wherein the quantization parameter is determined according to either the maximum value of each type of data to be quantized and the minimum value of each type of data to be quantized, or the maximum absolute value of each type of data, together with the data bit width.