US 12,079,592 B2
Deep neural network accelerator including lookup table based bit-serial processing elements
Hoi-Jun Yoo, Daejeon (KR); and Jin Mook Lee, Seosan-si (KR)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Nov. 20, 2019, as Appl. No. 16/689,199.
Claims priority of application No. 10-2018-0143617 (KR), filed on Nov. 20, 2018; and application No. 10-2019-0146454 (KR), filed on Nov. 15, 2019.
Prior Publication US 2020/0160161 A1, May 21, 2020
Int. Cl. G06F 7/544 (2006.01); G06N 3/04 (2023.01); G06N 3/063 (2023.01); G06F 17/16 (2006.01)
CPC G06F 7/5443 (2013.01) [G06N 3/04 (2013.01); G06N 3/063 (2013.01); G06F 17/16 (2013.01); G06F 2207/4824 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A deep neural network accelerator comprising:
a feature loader configured to store input features;
a weight memory configured to store a weight; and
a processing element comprising at least one lookup table bundle and an accumulator, wherein the processing element is configured to, in a current operation:
obtain the input features from the feature loader, the input features comprising at least one input feature utilized in a previous operation prior to the current operation;
obtain weight values;
generate results according to the weight values by applying the weight values to the input features including the at least one input feature utilized in the previous operation;
receive, by the at least one lookup table bundle, at least one target weight corresponding to the input features; and
generate output features by:
selecting a target result corresponding to the received at least one target weight from among the results; and
accumulating, by the accumulator, the target result,
wherein, based on the obtained weight values having a 1-bit precision, each of the at least one input feature utilized in the previous operation are utilized in the current operation,
wherein, based on the obtained weight values having a precision that is greater than 1-bit precision, fewer than each of the at least one input feature utilized in the previous operation are utilized in the current operation, and
wherein the processing element further comprises:
file registers configured to store the results;
multiplexers configured to receive the at least one target weight and to select the target result from the file registers; and
a weight input circuit configured to receive weight values as much as the number of bits identical to a product of the number of the input features and the number of the multiplexers, at the at least one target weight, and to transfer the received weight values to the multiplexers.