US 11,868,770 B2
	Computer processor for higher precision computations using a mixed-precision decomposition of operations
Gregory Henry, Hillsboro, OR (US); and Alexander Heinecke, San Jose, CA (US)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by INTEL CORPORATION, Santa Clara, CA (US)
Filed on Dec. 29, 2022, as Appl. No. 18/091,157.
Application 18/091,157 is a continuation of application No. 17/125,846, filed on Dec. 17, 2020, granted, now 11,126,428.
Application 17/125,846 is a continuation of application No. 17/069,230, filed on Oct. 13, 2020, granted, now 11,544,057.
Application 17/069,230 is a continuation of application No. 16/144,964, filed on Sep. 27, 2018, granted, now 10,853,067, issued on Dec. 1, 2020.
Prior Publication US 2023/0214215 A1, Jul. 6, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/30 (2018.01); G06F 7/483 (2006.01); G06F 7/487 (2006.01); G06F 7/53 (2006.01); G06F 7/485 (2006.01); G06F 7/544 (2006.01); G06F 1/16 (2006.01)

CPC G06F 9/30014 (2013.01) [G06F 1/16 (2013.01); G06F 7/483 (2013.01); G06F 7/485 (2013.01); G06F 7/4876 (2013.01); G06F 7/5324 (2013.01); G06F 7/5443 (2013.01); G06F 9/30025 (2013.01); G06F 9/30145 (2013.01)]

16 Claims

1. A processor comprising:

a cache to store data; and

a plurality of cores coupled to the cache, at least one core of the plurality of cores comprising:

execution circuitry to execute one or more instructions to perform a matrix multiplication with a first source matrix and a second source matrix to generate a result matrix, the execution of the one or more instructions comprising:

converting a first plurality of data elements of the first source matrix and a second plurality of data elements of the second source matrix from a single-precision floating point format to a reduced precision floating point format having fewer mantissa bits than the single-precision floating point format and a same number of exponent bits as the single-precision floating point format; and

performing a plurality of parallel fused multiply-add operations to multiply the first plurality of data elements in the reduced precision floating point format by corresponding data elements of the second plurality of data elements in the reduced precision floating point format to generate a plurality of products, and to add the plurality of products to accumulated values to generate single-precision floating point data elements of the result matrix.