CPC G06N 3/084 (2013.01) [G06N 3/04 (2013.01); G06N 3/0495 (2023.01)] | 20 Claims |
1. A processor-implemented method for an electronic device including a memory in a processor of the electronic device, the method comprising:
obtaining local quantization data by firstly quantizing, based on a local maximum value for each output channel of a current feedforward layer among sequential feedforward layers of a neural network, global recovery data obtained by recovering output data of an operation of the current layer based on a global maximum value output from a previous feedforward layer immediately preceding the current layer among the sequential feedforward layers;
storing the local quantization data in the memory to perform an operation of a next layer of the neural network;
obtaining global quantization data by secondarily quantizing, based on a global maximum value corresponding to the current layer, local recovery data obtained by recovering the local quantization data based on the local maximum value for each output channel of the current layer; and
providing the global quantization data as input data for the operation of the next layer,
wherein the global recovery data is sequentially quantized for each output channel of the current layer, a resource used by the memory for the operation is reduced compared to using the memory for all output channels of the current layer.
|