US 12,443,829 B2
Neural network processing method and apparatus based on nested bit representation
Seohyung Lee, Seoul (KR); Youngjun Kwak, Seoul (KR); Jinwoo Son, Seoul (KR); Changyong Son, Anyang-si (KR); Sangil Jung, Suwon-si (KR); Chang Kyu Choi, Seongnam-si (KR); and Jaejoon Han, Seoul (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Aug. 12, 2019, as Appl. No. 16/537,752.
Claims priority of application No. 10-2018-0165585 (KR), filed on Dec. 19, 2018.
Prior Publication US 2020/0202199 A1, Jun. 25, 2020
Int. Cl. G06N 3/063 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01)
CPC G06N 3/063 (2013.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01)] 9 Claims
OG exemplary drawing
 
1. A processor-implemented training method, comprising:
executing an iterative process, by one or more processors, with training data and an in-training first neural network, configured to perform a first task, to generate a trained first neural network, configured to perform the first task and a second task different from the first task, that has a trained first layer including trained first weights that each have a first bit-width corresponding to a first precision, the iterative process including:
quantizing in-training first weights, having the first bit-width of an in-training first layer of the in-training first neural network to generate second weights of a first layer of a second neural network, that have a second bit-width that is less than the first bit-width;
executing the second neural network using the second weights, including applying the training data to the first layer of the second neural network and determining loss values, corresponding to the second task, of the first layer of the second neural network;
updating the in-training first weights of the in-training first layer of the in-training first neural network based on the determined loss values; and
performing, for each of the updated in-training first weights, a quantization of a corresponding updated in-training first weight of the updated in-training first weights to generate a corresponding first weight of the trained first weights that includes a nested second weight having the second bit-width that shares bits with the corresponding first weight,
wherein the updating of the in-training first weights comprises updating the in-training first weights of the first bit-width based on statistical information of loss gradients corresponding to the determined loss values,
wherein the updating of the in-training first weights further comprises calculating the statistical information by assigning a high weighted value to a loss gradient corresponding to a weight for which a high priority is set among the second weights of the second bit-width, and
wherein the nested second weight is nested in the corresponding first weight and stored in a same memory space.