US 12,014,273 B2
Low precision and coarse-to-fine dynamic fixed-point quantization design in convolution neural network
Jie Wu, San Diego, CA (US); Yunhan Ma, San Diego, CA (US); Bike Xie, San Diego, CA (US); Hsiang-Tsun Li, Taichung (TW); Junjie Su, San Diego, CA (US); and Chun-Chen Liu, San Diego, CA (US)
Assigned to Kneron (Taiwan) Co., Ltd., Taipei (TW)
Filed by Kneron (Taiwan) Co., Ltd., Taipei (TW)
Filed on Aug. 27, 2019, as Appl. No. 16/551,753.
Claims priority of provisional application 62/778,299, filed on Dec. 12, 2018.
Prior Publication US 2020/0193270 A1, Jun. 18, 2020
Int. Cl. G06N 3/082 (2023.01); G06N 3/045 (2023.01)
CPC G06N 3/082 (2013.01) [G06N 3/045 (2023.01)] 6 Claims
OG exemplary drawing
 
1. A method of quantizing a floating pre-trained convolution neural network (CNN) model comprising:
inputting input data to the floating pre-trained CNN model to generate floating feature maps for each layer of the floating pre-trained CNN model;
inputting the floating feature maps to a statistical analysis simulator to generate a dynamic quantization range for each layer of the floating pre-trained CNN model; and
quantizing the floating pre-trained CNN model according to the dynamic quantization range for each layer of the floating pre-trained CNN model to generate a quantized CNN model, a scalar factor of each layer of the floating pre-trained CNN model, and a fractional bit-width of the quantized CNN model, wherein quantizing the floating pre-trained CNN model comprises:
acquiring a plurality of weights of each layer of the floating pre-trained CNN model;
setting the scalar factor of each layer of the floating pre-trained CNN model according to a maximum weight of the plurality of weights and a minimum weight of the plurality of weights;
applying the scalar factor of each layer of the floating pre-trained CNN model to an activation vector at each layer of the floating pre-trained CNN model; and
minimizing a quantization error of each layer of the quantized CNN model according to the scalar factor by using a minimum mean square error approach as

OG Complex Work Unit Math
wherein s(l) is the scalar factor at an l-th layer, xi(l) represents output features in an i-th channel at the l-th layer, Q( ) is a quantization function, and M is a total number of channels;
wherein the scalar factor of each layer of the floating pre-trained CNN model is associated with a quantization bit-width and the dynamic quantization range when quantizing the floating pre-trained CNN model.