CPC G06N 3/045 (2023.01) [G06N 3/063 (2013.01); G06N 3/08 (2013.01)] | 21 Claims |
1. A computation method implemented in a convolutional neural network of an electronic computing device, comprising:
receiving original data;
determining a first optimal quantization step size according to a distribution of the original data, wherein the step of determining the first optimal quantization step size comprises:
calculating a mean and a variance of the distribution of the original data; calculating a first quantization parameter according to the mean and variance of the distribution of the original data; and determining the first optimal quantization step size according to the first quantization parameter;
performing fixed-point processing to the original data according to the first optimal quantization step size to generate first data;
training the convolutional neural network using a training data set;
inputting the first data to a first layer of the convolutional neural network to generate first output data;
determining a second optimal quantization step size according to a distribution of the first output data, wherein the step of determining the second optimal quantization step size comprises:
calculating a mean and a variance of the distribution of the first output data; calculating a second quantization parameter according to the mean and variance of the distribution of the first output data; and determining the second optimal quantization step size according to the second quantization parameter;
performing the fixed-point processing to the first output data according to the second optimal quantization step size to generate second data; and
inputting the second data to a second layer of the convolutional neural network;
wherein before performing the fixed-point processing to the first output data according to the second optimal quantization step size, the first output data is output to a rectified linear (ReLU) layer;
wherein the ReLU layer is implemented by using a Signoid function or a Tanh function;
wherein the step of determining the first/second optimal quantization step size further comprises:
determining a fixed-point format of the data to be quantized according to the first/second optimal quantization step size, wherein the fixed-point format includes a number of bits for a sign part, a number of bits for an integer part and a number of bits for a fraction part;
wherein the number of bits for an integer part is m, the number of bits for a fraction part is n, wherein m and n are expressed as:
m=log2((M−1)×Δ/2),
n=−log2 (Δ),
where M is a quantization level and Δ is the first/second optimal quantization step size.
|