| CPC G06F 9/30036 (2013.01) [G06F 7/485 (2013.01); G06F 7/4876 (2013.01); G06F 7/762 (2013.01); G06F 9/3001 (2013.01); G06F 9/30032 (2013.01); G06F 9/30043 (2013.01); G06F 9/30109 (2013.01); G06F 9/30112 (2013.01); G06F 9/30134 (2013.01); G06F 9/30145 (2013.01); G06F 9/30149 (2013.01); G06F 9/3016 (2013.01); G06F 9/30185 (2013.01); G06F 9/30196 (2013.01); G06F 9/3818 (2013.01); G06F 9/3836 (2013.01); G06F 17/16 (2013.01); G06F 2212/454 (2013.01)] | 30 Claims |

|
1. An apparatus comprising:
a first one or more registers to store a first M row by K column (M×K) matrix,
a second one or more registers to store a second K row by N column (K×N) matrix,
a third one or more registers to store a third M row by N column (M×N) matrix,
decode circuitry to decode a single instruction, the single instruction having fields to indicate the first (M×K) matrix as a first operand for execution of the single instruction, the second (K×N) matrix as a second operand for execution of the single instruction, and the third (M×N) matrix as a third operand for execution of the single instruction, the first M×K matrix having data elements of a first size, the second K×N matrix having data elements of the first size, and the third M×N matrix having data elements of a second size four times the first size; and
execution circuitry, coupled with the decode circuitry, the execution circuitry to perform operations corresponding to the instruction, including to:
for each row of the first M×K matrix, and each column of the second K×N matrix:
generate a dot-product from all data elements of the row of the first M×K matrix and all data elements of the column of the second K×N matrix; and
accumulate the dot-product with a data element from a corresponding row and a corresponding column of the third M×N matrix.
|