US 11,657,254 B2
Computation method and device used in a convolutional neural network
Jie Pan, Beijing (CN); and Xu Wang, Beijing (CN)
Assigned to GLENFLY TECH CO., LTD., Shanghai (CN)
Filed by VIA Alliance Semiconductor Co., Ltd., Shanghai (CN)
Filed on Aug. 10, 2017, as Appl. No. 15/673,774.
Claims priority of application No. 201710417495.1 (CN), filed on Jun. 6, 2017.
Prior Publication US 2018/0349758 A1, Dec. 6, 2018
Int. Cl. G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01)
CPC G06N 3/045 (2023.01) [G06N 3/063 (2013.01); G06N 3/08 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A computation method implemented in a convolutional neural network of an electronic computing device, comprising:
receiving original data;
determining a first optimal quantization step size according to a distribution of the original data, wherein the step of determining the first optimal quantization step size comprises:
calculating a mean and a variance of the distribution of the original data; calculating a first quantization parameter according to the mean and variance of the distribution of the original data; and determining the first optimal quantization step size according to the first quantization parameter;
performing fixed-point processing to the original data according to the first optimal quantization step size to generate first data;
training the convolutional neural network using a training data set;
inputting the first data to a first layer of the convolutional neural network to generate first output data;
determining a second optimal quantization step size according to a distribution of the first output data, wherein the step of determining the second optimal quantization step size comprises:
calculating a mean and a variance of the distribution of the first output data; calculating a second quantization parameter according to the mean and variance of the distribution of the first output data; and determining the second optimal quantization step size according to the second quantization parameter;
performing the fixed-point processing to the first output data according to the second optimal quantization step size to generate second data; and
inputting the second data to a second layer of the convolutional neural network;
wherein before performing the fixed-point processing to the first output data according to the second optimal quantization step size, the first output data is output to a rectified linear (ReLU) layer;
wherein the ReLU layer is implemented by using a Signoid function or a Tanh function;
wherein the step of determining the first/second optimal quantization step size further comprises:
determining a fixed-point format of the data to be quantized according to the first/second optimal quantization step size, wherein the fixed-point format includes a number of bits for a sign part, a number of bits for an integer part and a number of bits for a fraction part;
wherein the number of bits for an integer part is m, the number of bits for a fraction part is n, wherein m and n are expressed as:
m=log2((M−1)×Δ/2),
n=−log2 (Δ),
where M is a quantization level and Δ is the first/second optimal quantization step size.