| CPC G06F 9/3869 (2013.01) [G06F 7/50 (2013.01); G06F 7/523 (2013.01); G06F 7/5443 (2013.01); G06F 9/3824 (2013.01); G06F 17/16 (2013.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 20/00 (2019.01)] | 20 Claims |

|
1. A hardware unit to perform a matrix multiplication between a first matrix of first data elements and a second matrix of second data elements, the hardware unit comprising:
a plurality of pipeline stages comprising:
a first pipeline stage comprising a plurality of arithmetic units, wherein each of the plurality of arithmetic units is a multiplier configured to multiply a first data element of the first matrix and a second data element of the second matrix to produce a multiplication data element, and
one or more subsequent pipeline stages following the first pipeline stage that sum the multiplication data elements, each of the one or more subsequent pipeline stages comprising one or more arithmetic units, each of the one or more arithmetic units being an adder configured to add at least two data elements produced by a previous pipeline stage to produce an addition data element;
a set of storage elements for an arithmetic unit of a pipeline stage of the plurality of pipeline stages, the set of storage elements for the arithmetic unit comprising a plurality of storage elements corresponding to each data element to be processed by that arithmetic unit, the plurality of storage elements corresponding to a data element comprising a storage element for each bit of the data element; and
clock gating control logic for the pipeline stage configured to:
receive a first gate signal corresponding to a first data element to be processed by the arithmetic unit and a second gate signal corresponding to a second data element to be processed by the arithmetic unit that indicate whether the corresponding data element to be processed by the arithmetic unit is to be treated as having a zero value, the arithmetic unit being a multiplier or an adder;
clock gate the plurality of storage elements corresponding to a data element to be processed by the arithmetic unit in response to the gate signal corresponding to the respective one of the first and the second data elements to be processed by the arithmetic unit indicating that the data element to be processed by the arithmetic unit is to be treated as having a zero value, and
provide an output gate signal for each multiplication data element or addition data element produced by an arithmetic unit that is separate from the multiplication data element or the addition data element produced by the arithmetic unit to clock gating control logic for a following pipeline stage of the one or more subsequent pipeline stages that indicates whether the multiplication data element or the addition data element that is produced by the arithmetic unit and is to be processed by the one or more arithmetic units of the following pipeline stage of the one or more subsequent pipeline stages is to be treated as having a zero value.
|