US 11,948,074 B2
Method and apparatus with neural network parameter quantization
Sangil Jung, Suwon-si (KR); Changyong Son, Anyang-si (KR); Seohyung Lee, Seoul (KR); Jinwoo Son, Seoul (KR); and Chang Kyu Choi, Seongnam-si (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Apr. 30, 2019, as Appl. No. 16/398,710.
Claims priority of provisional application 62/677,348, filed on May 29, 2018.
Claims priority of provisional application 62/671,105, filed on May 14, 2018.
Claims priority of application No. 10-2018-0109197 (KR), filed on Sep. 12, 2018.
Prior Publication US 2019/0347550 A1, Nov. 14, 2019
Int. Cl. G06N 3/08 (2023.01); B25J 9/16 (2006.01); G06N 3/04 (2023.01); G06V 40/00 (2022.01)
CPC G06N 3/08 (2013.01) [B25J 9/161 (2013.01); B25J 9/163 (2013.01); G06N 3/04 (2013.01); G06V 40/00 (2022.01)] 26 Claims
OG exemplary drawing
 
1. A processor-implemented data processing method in a neural network, the data processing method comprising:
performing an inference operation by implementing a current convolutional layer, of the neural network, provided an input activation map to generate an output activation map, where a kernel weight of the current convolutional layer is a weight quantized to a first representation bit number from a trained kernel weight for the current convolutional layer; and
outputting another activation map that includes activations that are activation results, of the current convolutional layer, quantized to a second representation bit number within a range represented by an activation quantization parameter that includes a first and second parameters,
wherein the first parameter is dependent on first and second thresholds derived from the output activation map, and the second parameter is dependent on the first and second thresholds, and
wherein the quantization of the activation map is dependent on the first parameter and the second parameter.