US 12,079,591 B2
Neural network device, method of operating the neural network device, and application processor including the neural network device
Hyunpil Kim, Seoul (KR); Hyunwoo Sim, Seoul (KR); Seongwoo Ahn, Yongin-si (KR); Hasong Kim, Hwaseong-si (KR); and Doyoung Lee, Suwon-si (KR)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Mar. 30, 2021, as Appl. No. 17/217,398.
Claims priority of application No. 10-2020-0042408 (KR), filed on Apr. 7, 2020; and application No. 10-2021-0001056 (KR), filed on Jan. 5, 2021.
Prior Publication US 2021/0312012 A1, Oct. 7, 2021
Int. Cl. G06F 17/16 (2006.01); G06F 5/01 (2006.01); G06F 7/22 (2006.01); G06F 7/487 (2006.01); G06F 7/499 (2006.01); G06F 7/544 (2006.01); G06N 3/02 (2006.01)
CPC G06F 7/4876 (2013.01) [G06F 5/01 (2013.01); G06F 7/22 (2013.01); G06F 7/49921 (2013.01); G06F 7/5443 (2013.01); G06F 17/16 (2013.01); G06N 3/02 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A neural network device, comprising:
a floating-point arithmetic circuit configured to perform a dot product operation and an accumulation operation; and
a buffer configured to store first cumulative data generated by the floating-point arithmetic circuit,
wherein the floating-point arithmetic circuit comprises:
a sign and exponent calculator configured to identify a maximum value from a plurality of exponent addition results, obtained by respectively adding exponents of a plurality of floating-point data pairs, and an exponent value of the first cumulative data;
a multiplier configured to generate a plurality of fraction multiplication results by respectively multiplying fractions of the plurality of floating-point data pairs;
an align shifter configured to perform, based on the maximum value, an align shift of the plurality of fraction multiplication results and a fraction part of the first cumulative data;
an adder configured to perform a summation of the plurality of aligned fraction multiplication results and the aligned fraction part of the first cumulative data; and
a leading one detection circuit configured to perform a leading one detection on addition result data generated from the summation, to generate shift information,
wherein the leading one detection circuit comprises:
a first leading one detection circuit configured to perform a first leading one detection on a first number of upper bits of the addition result data generated from the summation; and
a second leading one detection circuit configured to perform, in parallel with the first leading one detection, a second leading one detection on a second number of lower bits of the addition result data, the upper bits including a most significant bit (MSB) of the addition result data, and the lower bits including a least significant bit (LSB) of the addition result data,
wherein the floating-point arithmetic circuit is configured to shift the addition result data based on results of the first leading one detection and the second leading one detection, and
wherein the first number is less than the second number.