US 12,277,502 B2
Neural network activation compression with non-uniform mantissas
Daniel Lo, Bothell, WA (US); Amar Phanishayee, Seattle, WA (US); Eric S. Chung, Woodinville, WA (US); and Yiren Zhao, Cambridge (GB)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jan. 17, 2024, as Appl. No. 18/415,159.
Application 18/415,159 is a continuation of application No. 18/092,876, filed on Jan. 3, 2023, granted, now 12,067,495.
Application 18/092,876 is a continuation of application No. 16/256,998, filed on Jan. 24, 2019, granted, now 11,562,247, issued on Jan. 24, 2023.
Prior Publication US 2024/0152758 A1, May 9, 2024
Int. Cl. G06N 3/084 (2023.01); G06F 9/30 (2018.01); H03M 7/30 (2006.01); H03M 7/46 (2006.01)
CPC G06N 3/084 (2013.01) [G06F 9/30141 (2013.01); H03M 7/3059 (2013.01); H03M 7/46 (2013.01); H03M 7/702 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computing system comprising:
one or more hardware processors;
at least one memory coupled to the one or more hardware processors; and
one or more computer-readable storage media storing computer-executable instructions that, when executed, cause the computing system to perform operations comprising:
performing forward propagation for a first layer of a neural network using values in a first floating-point format to produce first activation values in the first floating-point format or a second floating-point format different than the first floating-point format, the first floating-point format and the second floating-point format having uniform mantissas;
converting at least one of the activation values to a third floating-point format having a non-uniform mantissa to provide compressed activation values;
storing the compressed activation values in the at least one memory; and
propagating the activation values in the first floating-point format or the second floating point format to a second layer of the neural network to produce second activation values in a fourth floating-point format, the fourth floating-point format being different than the first floating point format and being different than the second floating point format.