| CPC G06F 7/5443 (2013.01) [G06F 7/4876 (2013.01); G06F 7/49942 (2013.01); G06F 15/8046 (2013.01)] | 20 Claims |

|
1. A systolic array processor organized in rows and columns, each row comprising:
a bit reducer configured to convert 32-bit input data elements into two 21-bit input data elements, the bit reducer comprising:
first hardware circuitry configured to convert a 32-bit input data element of the 32-bit input data elements into a first 21-bit input data element, the first 21-bit input data element corresponding to a set of most significant bits of a significand portion of the 32-bit input data element, the first hardware circuitry comprising:
second hardware circuitry configured to reduce a quantity of trailing bits representing the significand portion of the 32-bit input data element to produce a first reduced significand portion of the 32-bit input data element, the first reduced significand portion corresponding to the set of most significant bits; and
third hardware circuitry configured to increase a quantity of bits representing an exponent portion of the 32-bit input data element to produce a first increased exponent portion,
wherein the first hardware circuitry produces the first 21-bit input data element based on the first reduced significand portion and the first increased exponent portion; and
fourth hardware circuitry configured to convert the 32-bit input data element into a second 21-bit input data element, the second 21-bit input data element corresponding to a set of least significant bits of the significand portion of the 32-bit input data element, the fourth hardware circuitry comprising:
fifth hardware circuitry configured to reduce a quantity of leading bits representing the significand portion of the 32-bit input data element to produce a second reduced significand portion of the 32-bit input data element, the second reduced significand portion corresponding to the set of least significant bits; and
sixth hardware circuitry configured to increase the quantity of bits representing the exponent portion of the 32-bit input data element to produce a second increased exponent portion,
wherein the fourth hardware circuitry produces the second 21-bit input data element based on the second reduced significand portion and the second increased exponent portion; and
a plurality of processing elements, wherein a first processing element of a first row of the plurality of processing elements and a first column of the plurality of processing elements is configured to:
generate a multiplier product based on the first 21-bit input data element and a weight;
adjust a bit-length of a significand portion of the multiplier product to correspond to a bit-length supported by an adder of the first processing element;
generate an addition result based on addition of the adjusted multiplier product and an input partial sum, wherein a first output partial sum of the first processing element is based on the addition result, and wherein a second output partial sum of the first processing element is based on the second 21-bit input data element;
route the first 21-bit input data element and the second 21-bit input data element to a second processing element of the first row of the plurality of processing elements and a second column of the plurality of processing elements, wherein a 21 bit-length corresponds to a maximum supported bit-length of a multiplier of the first processing element; and
route the first output partial sum and the second output partial sum to a third processing element of a second row of the plurality of processing elements and the first column of the plurality of processing elements, wherein an output of the systolic array processor is based on a plurality of outputs of the first column of the plurality of processing elements generated using reduced inputs.
|