CPC G06F 9/3001 (2013.01) [G06F 7/483 (2013.01); G06F 7/5443 (2013.01); G06F 9/30014 (2013.01); G06F 9/30036 (2013.01); G06F 9/3851 (2013.01); G06F 9/3888 (2023.08); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/08 (2013.01); G09G 5/393 (2013.01); G06F 1/16 (2013.01); G06F 9/30025 (2013.01); G06F 9/3013 (2013.01); G06F 17/16 (2013.01); G06F 2207/3824 (2013.01); G06N 20/00 (2019.01); G06T 15/005 (2013.01)] | 25 Claims |
1. An apparatus comprising:
an array of processing units to process instructions;
a plurality of memory interfaces, each memory interface to couple one or more of the processing units to a corresponding memory;
each processing unit in the array of processing units comprising:
execution circuitry to execute a floating-point multiply-accumulate instruction, the execution circuitry comprising:
mantissa multiplication circuitry, wherein the mantissa multiplication circuitry is shared with an integer datapath of the execution circuitry, wherein responsive to the floating-point multiply-accumulate instruction, the mantissa multiplication circuitry is to perform a multiplication operation with a mantissa value of each 16-bit floating-point data element of a first plurality of 16-bit floating-point data elements and a mantissa value of a corresponding 16-bit floating-point data element of a second plurality of 16-bit floating-point data elements to generate a corresponding plurality of mantissa results;
exponent processing circuitry, responsive to the floating-point multiply-accumulate instruction, to perform an operation with an exponent value of each 16-bit floating-point data element of the first plurality of 16-bit floating-point data elements and an exponent value of each corresponding 16-bit floating-point data element of the second plurality of 16-bit floating-point data elements to generate a corresponding plurality of exponent results;
circuitry to process the plurality of mantissa results and the plurality of exponent results to generate a corresponding floating-point product; and
adder circuitry to generate a plurality of result floating-point values, each result floating-point value comprising a sum of one or more floating-point products of a plurality of floating-point products and a corresponding accumulated floating-point value of a plurality of accumulated floating-point values.
|