CPC G06N 3/063 (2013.01) [G06N 3/08 (2013.01)] | 16 Claims |
1. A neural network quantization device comprising:
one or more processors; and
memories for storing data;
wherein the one or more processors comprise a control module and a processing module; and the processing module includes a first operation sub-module which includes a main operation sub-module and a secondary operation sub-module;
wherein the control module is configured to determine a plurality of pieces of data to be quantized from target data of a neural network, and obtain a quantization result of the target data according to a piece of quantized data corresponding to each piece of data to be quantized, wherein the piece of quantized data corresponding to each piece of data to be quantized is obtained by quantizing the piece of data to be quantized using a corresponding quantization parameter, wherein the quantization parameter includes a point location;
wherein the first operation sub-module is configured to perform an operation related to the quantization result to obtain an operation result;
wherein the main operation sub-module is configured to send first data to the secondary operation sub-module, wherein the first data includes first-style data, which is in the quantization result and is obtained by quantizing according to the point location;
wherein the secondary operation sub-module is configured to perform multiplication on the received first data to obtain an intermediate result;
wherein the control module includes a parameter determination sub-module configured to compute the corresponding quantization parameter according to each piece of data to be quantized and a corresponding data bit width;
wherein the parameter determination sub-module includes:
a first point location determination sub-module configured to, when the quantization parameter does not include an offset, obtain a first-kind point location of each piece of data to be quantized according to a maximum value of an absolute value of each piece of data to be quantized and the corresponding data bit width;
a first maximum value determination sub-module configured to, when the quantization parameter does not include the offset, obtain a maximum value of the piece of quantized data according to each piece of data to be quantized and the corresponding data bit width; and
a first scale factor determination sub-module configured to obtain a first-kind scale factor of each piece of data to be quantized according to a maximum value of an absolute value of each piece of data to be quantized and the maximum value of the piece of quantized data;
a second point location determination sub-module configured to, when the quantization parameter includes the offset, obtain a second-kind point location of each piece of data to be quantized according to a maximum value and a minimum value of each piece of data to be quantized and the corresponding data bit width;
a second maximum value determination sub-module configured to, when the quantization parameter includes the offset, obtain a maximum value of the piece of quantized data according to each piece of data to be quantized and the corresponding data bit width;
a second scale factor determination sub-module configured to obtain a second-kind scale factor of each piece of data to be quantized according to a maximum value and a minimum value of the piece of data to be quantized, and the maximum value of the piece of quantized data;
an offset determination sub-module configured to obtain the offset of each piece of data to be quantized according to a maximum value and a minimum value of each piece of data to be quantized;
wherein the main operation sub-module is also configured to perform an operation on data except for the first data in the intermediate result and the quantization result to obtain the operation result; and
data to be quantized is quantized according to the corresponding quantization parameter, to reduce a storage space of the data while ensuring a precision, and ensuring an accuracy and a reliability of the operation result and increase an operation efficiency.
|