US 11,676,003 B2
Training neural network accelerators using mixed precision data formats
Bita Darvish Rouhani, Bellevue, WA (US); Taesik Na, Sammamish, WA (US); Eric S. Chung, Woodinville, WA (US); Daniel Lo, Bothell, WA (US); and Douglas C. Burger, Bellevue, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Dec. 18, 2018, as Appl. No. 16/223,603.
Prior Publication US 2020/0193274 A1, Jun. 18, 2020
Int. Cl. G06N 20/00 (2019.01); G06N 3/063 (2023.01); G06F 17/15 (2006.01); G06F 17/16 (2006.01); G06N 3/084 (2023.01)
CPC G06N 3/063 (2013.01) [G06F 17/15 (2013.01); G06F 17/16 (2013.01); G06N 3/084 (2013.01); G06N 20/00 (2019.01)] 23 Claims
OG exemplary drawing
 
1. A computing system comprising:
a computer-readable memory storing an operational parameter of a given layer of a neural network; and
a hardware accelerator in communication with the computer-readable memory for accelerating tensor operations, the hardware accelerator configured to:
receive an input tensor for a given layer of a multi-layer neural network;
convert the input tensor from a normal-precision floating-point format to a quantized-precision floating-point format, the quantized-precision floating-point format being a block floating-point format, wherein a first converted input tensor portion corresponding to a first portion of the input tensor comprises a first common exponent for values in the first portion of the input tensor and a first plurality of mantissa values and a second converted tensor portion corresponding to a second portion of the input tensor comprises a second common exponent value for values in the second portion of the input tensor and a second plurality of mantissa values, wherein the first common exponent is different than the second common exponent;
perform a tensor operation using the input tensor converted to the quantized-precision floating-point format;
convert a result of the tensor operation from the quantized-precision floating-point format to the normal-precision floating-point format to provide a converted result in the normal-precision floating-point format; and
in a training iteration of a plurality of iterations of training of the multi-layer neural network, updating the operational parameter of the given layer of the multi-layer neural network stored in the computer-readable memory using the converted result in the normal precision floating-point format, where the operational parameter of the given layer of the neural network is stored in normal-precision floating-point format.