CPC G06T 1/20 (2013.01) [G06F 5/01 (2013.01); G06F 7/501 (2013.01); G06F 7/523 (2013.01); G06F 7/5443 (2013.01); G06F 17/153 (2013.01); G06F 17/16 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01); G06F 2207/382 (2013.01); G06F 2207/4824 (2013.01)] | 20 Claims |
1. A graphics processing unit comprising:
an interface to an interconnect fabric; and
a compute unit coupled with the interface, the compute unit including circuitry configured to:
quantize elements of a floating-point tensor to convert the floating-point tensor into a dynamic fixed-point tensor, wherein to quantize an element of the floating-point tensor, the circuitry is to compute a right-shift value based on a difference between an exponent value of the element of the floating-point tensor and the exponent value of an absolute maximum value of the floating-point tensor and right-shift a mantissa of the element based on the right-shift value to generate a magnitude integer;
perform a compute operation on input including the dynamic fixed-point tensor; and
generate an output tensor via the compute operation.
|