CPC G06F 9/3001 (2013.01) [G06F 7/483 (2013.01); G06F 7/5443 (2013.01); G06F 9/30014 (2013.01); G06F 9/30036 (2013.01); G06F 9/3851 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/08 (2013.01); G09G 5/393 (2013.01); G06F 9/30025 (2013.01); G06F 9/3013 (2013.01); G06F 17/16 (2013.01); G06F 2207/3824 (2013.01); G06N 20/00 (2019.01); G06T 15/005 (2013.01)] | 28 Claims |
1. An apparatus comprising:
an interconnect fabric;
a memory interface coupled to the interconnect fabric;
an input/output, IO, unit coupled to the interconnect fabric;
an array of multiprocessors coupled to the interconnect fabric, a multiprocessor in the array of multiprocessors to execute a mixed-precision instruction in parallel across multiple threads; and
virtualization circuitry to share the array of multiprocessors with a plurality of virtual machines, each virtual machine of the plurality of virtual machines having a dedicated slice of resources provided by the array of multiprocessors, the dedicated slice of resources including the multiprocessor in the array of multiprocessors, the multiprocessor comprising:
a plurality of registers to store packed floating-point operand values; and
execution circuitry to execute one or more of the mixed-precision instructions to perform a fused multiply-accumulate operation, the execution circuitry comprising:
a 16-bit multiplier to multiply a first 16-bit floating point source value and a second 16-bit floating point source value to generate an intermediate result; and
a 32-bit accumulator to add the intermediate result to an accumulated floating point value to generate a new accumulation result.
|