US 11,742,875 B1
	Compression of floating-point numbers for neural networks
Hsien-Kai Kuo, Hsinchu (TW); Huai-Ting Li, Hsinchu (TW); Shou-Yao Tseng, Hsinchu (TW); and Po-Yu Chen, Hsinchu (TW)
Assigned to MediaTek Inc., Hsinchu (TW)
Filed by MediaTek Inc., Hsinchu (TW)
Filed on Apr. 20, 2022, as Appl. No. 17/724,849.
Int. Cl. H03M 7/00 (2006.01); H03M 7/30 (2006.01); G06F 9/50 (2006.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01); G06N 3/063 (2023.01)

CPC H03M 7/30 (2013.01) [G06F 9/5027 (2013.01); G06N 3/04 (2013.01); G06N 3/063 (2013.01); G06N 3/08 (2013.01)]

20 Claims

1. A method of compressing floating-point numbers for neural network computations, comprising:

receiving a plurality of operands, wherein each operand has a floating-point representation of a sign bit, an exponent, and a fraction;

re-ordering the operands into a first sequence of consecutive sign bits, a second sequence of consecutive exponents, and a third sequence of consecutive fractions; and

compressing the first sequence, the second sequence, and the third sequence to remove at least duplicate exponents and to thereby losslessly generate a compressed data sequence.