US 12,001,929 B2
Mixed-precision neural processing unit (NPU) using spatial fusion with load balancing
Hamzah Abdelaziz, San Jose, CA (US); Joseph Hassoun, Los Gatos, CA (US); and Ali Shafiee Ardestani, Santa Clara, CA (US)
Assigned to SAMSUNG ELECTRONICS CO., LTD., (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Jun. 10, 2020, as Appl. No. 16/898,433.
Claims priority of provisional application 63/003,883, filed on Apr. 1, 2020.
Prior Publication US 2021/0312325 A1, Oct. 7, 2021
Int. Cl. G06N 20/00 (2019.01); H04L 67/1001 (2022.01)
CPC G06N 20/00 (2019.01) [H04L 67/1001 (2022.05)] 20 Claims
OG exemplary drawing
 
1. An apparatus, comprising:
a machine-learning system comprising:
a precision-determination circuit configured to separate a first value of a data set into a second value and a third value, the first value comprising a first predetermined number of bits, the second value comprising a second predetermined number of contiguous most-significant bits of the first value and the third value comprising a third predetermined number of contiguous least-significant bits of the first value, the first predetermined number equaling the second predetermined number plus the third predetermined number;
a load-balancing circuit configured to:
select for computation:
a first combination comprising the second value and the second predetermined number of contiguous bits of a first weight value corresponding to the first value, the second value comprising a first multiplicand of the first combination and the second predetermined number of contiguous bits of the first weight value comprising a second multiplicand of the first combination, and
a second combination comprising the third value and the third predetermined number of bits of the first weight value, the third value comprising a first multiplicand of the second combination and the third predetermined number of contiguous bits of the first weight value comprising a second multiplicand of the second combination; and
a first computation circuit configured to compute a first partial product of the first multiplicand and the second multiplicand of the first combination based on the load-balancing circuit being configured to alternate between loading the first computation circuit with (a) most-significant bits of a first data set and most-significant bits of a weight value of the first data set, and (b) least-significant bits of a second data set and least-significant bits of a weight value of the second data set; and
a second computation circuit configured to compute a second partial product of the first multiplicand and the second multiplicand of the second combination based on the load-balancing circuit being configured to alternate between loading the second computation circuit with (a) least-significant bits of the first data set and least-significant bits of the weight value of the first data set, and (b) most-significant bits of the second data set and most-significant bits of the weight value of the second data set.