US 12,443,841 B2
Four-bit training for machine learning
Xiao Sun, Pleasantville, NY (US); Ankur Agrawal, Chappaqua, NY (US); Kailash Gopalakrishnan, New York, NY (US); Naigang Wang, Ossining, NY (US); Chia-Yu Chen, White Plains, NY (US); and Jiamin Ni, Yorktown Heights, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Dec. 4, 2020, as Appl. No. 17/112,528.
Prior Publication US 2022/0180171 A1, Jun. 9, 2022
Int. Cl. G06N 3/08 (2023.01); G06F 7/499 (2006.01); G06F 17/16 (2006.01); G06N 3/063 (2023.01)
CPC G06N 3/08 (2013.01) [G06F 7/499 (2013.01); G06F 17/16 (2013.01); G06N 3/063 (2013.01)] 18 Claims
OG exemplary drawing
 
1. An apparatus comprising:
a floating-point gradient register;
an integer register;
a memory bank; and
an array of processing units having M rows and N columns, each of said processing units in turn comprising:
a plurality of binary shifters, each having an integer input configured to obtain corresponding bits of a 4-bit integer multiplicand, a shift-specifying input configured to obtain corresponding bits in an exponent field of a 4-bit floating point multiplier, the multiplier being specified in a mantissaless four-bit floating point format comprising a sign bit, three exponent bits, and no mantissa bits, and an output;
an adder tree having a plurality of inputs coupled to said outputs of said plurality of shifters, and having an output; and
a rounder having an input coupled to said output of said adder tree and having an output;
wherein:
said integer inputs of said processing units are connected to said integer register;
said shift-specifying inputs of said processing units are connected to said floating-point gradient register; and
said outputs of said rounders are coupled to said memory bank.