CPC G06F 9/30014 (2013.01) [G06F 1/16 (2013.01); G06F 7/483 (2013.01); G06F 7/485 (2013.01); G06F 7/4876 (2013.01); G06F 7/5324 (2013.01); G06F 7/5443 (2013.01); G06F 9/30025 (2013.01); G06F 9/30145 (2013.01)] | 16 Claims |
1. A processor comprising:
a cache to store data; and
a plurality of cores coupled to the cache, at least one core of the plurality of cores comprising:
execution circuitry to execute one or more instructions to perform a matrix multiplication with a first source matrix and a second source matrix to generate a result matrix, the execution of the one or more instructions comprising:
converting a first plurality of data elements of the first source matrix and a second plurality of data elements of the second source matrix from a single-precision floating point format to a reduced precision floating point format having fewer mantissa bits than the single-precision floating point format and a same number of exponent bits as the single-precision floating point format; and
performing a plurality of parallel fused multiply-add operations to multiply the first plurality of data elements in the reduced precision floating point format by corresponding data elements of the second plurality of data elements in the reduced precision floating point format to generate a plurality of products, and to add the plurality of products to accumulated values to generate single-precision floating point data elements of the result matrix.
|