US 11,989,258 B2
Performing matrix multiplication in hardware
Andrew Everett Phelps, Middleton, WI (US); and Norman Paul Jouppi, Palo Alto, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Nov. 9, 2020, as Appl. No. 17/093,439.
Application 17/093,439 is a continuation of application No. 16/826,075, filed on Mar. 20, 2020, granted, now 10,831,862.
Application 16/826,075 is a continuation of application No. 15/983,047, filed on May 17, 2018, granted, now 10,621,269, issued on Apr. 14, 2020.
Claims priority of provisional application 62/507,748, filed on May 17, 2017.
Prior Publication US 2021/0124795 A1, Apr. 29, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 17/16 (2006.01); G06F 7/483 (2006.01); G06F 7/487 (2006.01); G06F 9/30 (2018.01); G06N 3/02 (2006.01); G06N 3/048 (2023.01); G06N 3/063 (2023.01)
CPC G06F 17/16 (2013.01) [G06F 7/483 (2013.01); G06F 7/4876 (2013.01); G06F 9/30014 (2013.01); G06N 3/02 (2013.01); G06N 3/048 (2023.01); G06N 3/063 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A method of performing a matrix multiplication using a hardware circuit, the method comprising:
obtaining, by a matrix computation unit of the hardware circuit, an activation input value and a weight input value, the activation input value and the weight input value each having a first floating point format, wherein the hardware circuit is configured to perform computations for a neural network having a plurality of layers, wherein the activation input value and the weight input value are associated with a layer of the plurality of layers;
wherein the first floating point format is a 16-bit format, comprising: one available bit for a sign, eight available bits for an exponent, and seven available bits for a significand to represent a floating point in the first floating point format;
multiplying, by a multiplication circuitry of the matrix computation unit, the activation input value and the weight input value to generate a product value, the product value having a second floating point format, wherein the second floating point format is different from and has a higher precision than the first floating point format;
obtaining, by the matrix computation unit, a partial sum value in a third floating point format, wherein the third floating point format is different from the first floating point format and the second floating point format and has a higher precision than the first floating point format; and
combining, by a summation circuitry of the hardware circuit, at least the partial sum value and the product value to generate an updated partial sum value that has the third floating point format.