US 12,106,219 B2
Method and apparatus with neural network data quantizing
Seungkyu Choi, Daejeon (KR); Sangwon Ha, Seongnam-si (KR); Lee-Sup Kim, Daejeon (KR); and Jaekang Shin, Daejeon (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR); and Korea Advanced Institute of Science and Technology, Daejeon (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR); and Korea Advanced Institute of Science and Technology, Daejeon (KR)
Filed on May 13, 2020, as Appl. No. 15/931,362.
Claims priority of application No. 10-2019-0126298 (KR), filed on Oct. 11, 2019.
Prior Publication US 2021/0110270 A1, Apr. 15, 2021
Int. Cl. G06N 3/084 (2023.01); G06N 3/04 (2023.01); G06N 3/0495 (2023.01)
CPC G06N 3/084 (2013.01) [G06N 3/04 (2013.01); G06N 3/0495 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A processor-implemented method for an electronic device including a memory in a processor of the electronic device, the method comprising:
obtaining local quantization data by firstly quantizing, based on a local maximum value for each output channel of a current feedforward layer among sequential feedforward layers of a neural network, global recovery data obtained by recovering output data of an operation of the current layer based on a global maximum value output from a previous feedforward layer immediately preceding the current layer among the sequential feedforward layers;
storing the local quantization data in the memory to perform an operation of a next layer of the neural network;
obtaining global quantization data by secondarily quantizing, based on a global maximum value corresponding to the current layer, local recovery data obtained by recovering the local quantization data based on the local maximum value for each output channel of the current layer; and
providing the global quantization data as input data for the operation of the next layer,
wherein the global recovery data is sequentially quantized for each output channel of the current layer, a resource used by the memory for the operation is reduced compared to using the memory for all output channels of the current layer.