US 12,333,274 B2
Data-type-aware clock-gating
Joshua Wayne Bowman, Austin, TX (US); Thomas A. Volpe, Austin, TX (US); Sundeep Amirineni, Austin, TX (US); Nishith Desai, Austin, TX (US); and Ron Diamant, Santa Clara, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 11, 2020, as Appl. No. 17/247,475.
Prior Publication US 2022/0188073 A1, Jun. 16, 2022
Int. Cl. G06F 7/544 (2006.01); G06F 1/3237 (2019.01)
CPC G06F 7/5443 (2013.01) [G06F 1/3237 (2013.01)] 20 Claims
OG exemplary drawing
 
1. An integrated circuit device comprising:
an array of processing elements arranged in rows and columns, wherein each processing element includes:
a weight register configured to store a floating-point weight value that supports a plurality of data types;
a first weight clock-gate circuit configured to clock-gate a first portion of the weight register independently from rest of the weight register based on a clock enable signal, wherein the first portion of the weight register is configured to store a first group of weight data bits, wherein the clock enable signal is generated by combining results of comparing each bit stored in the first portion of the weight register with a corresponding input bit of an input to the first portion of the weight register, and wherein the first group of weight data bits is unused for a first data type, and includes both used and unused bits for a second data type;
a feature map (FMAP) register configured to store a floating-point FMAP value that supports the plurality of data types;
a first FMAP clock-gate circuit configured to clock-gate a first portion of the FMAP register independently from rest of the FMAP register, wherein the first portion of the FMAP register is configured to store a first group of FMAP data bits that are unused for a second data type of the plurality of data types;
a multiplier configured to multiply the floating-point FMAP value with the floating-point weight value to generate a multiplication result; and
an adder configured to add the multiplication result to a partial sum input to generate a partial sum output.