US 11,789,729 B2
Systems and methods for computing dot products of nibbles in two tile operands
Raanan Sade, Kibutz Sarid (IL); Simon Rubanovich, Haifa (IL); Amit Gradstein, Binyamina (IL); Zeev Sperber, Zichron Yackov (IL); Alexander Heinecke, San Jose, CA (US); Robert Valentine, Kiryat Tivon (IL); Mark J. Charney, Lexington, MA (US); Bret Toll, Hillsboro, OR (US); Jesus Corbal, King City, OR (US); and Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Dec. 29, 2017, as Appl. No. 15/858,916.
Prior Publication US 2019/0042235 A1, Feb. 7, 2019
Int. Cl. G06F 9/30 (2018.01); G06F 9/38 (2018.01)
CPC G06F 9/3001 (2013.01) [G06F 9/3005 (2013.01); G06F 9/3016 (2013.01); G06F 9/30036 (2013.01); G06F 9/30043 (2013.01); G06F 9/30076 (2013.01); G06F 9/30109 (2013.01); G06F 9/30123 (2013.01); G06F 9/30145 (2013.01); G06F 9/383 (2013.01); G06F 9/3824 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A processor comprising:
decode circuitry configured to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices to contain doubleword elements; and
execution circuitry configured to execute the decoded single tile dot product instruction by causing a grid of fused multiply accumulate circuits, comprising a set of multipliers, an accumulator, and a saturation circuit, to perform a flow K times for each element (M,N) of the identified destination matrix, the flow comprising:
generating eight products by multiplying each nibble of a doubleword (32-bit) element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix with a corresponding multiplier of the set of multipliers;
accumulating the eight products with previous contents of the doubleword element (M,N) of the identified destination matrix to generate a sum by the accumulator;
saturating the sum to generate a saturated sum by the saturation circuit when the sum is beyond a range of an integer; and
storing the saturated sum at element (M,N) of the identified destination matrix.