CPC G06F 9/3001 (2013.01) [G06F 9/30036 (2013.01); G06F 9/30038 (2023.08); G06F 9/30145 (2013.01); G06F 9/3861 (2013.01); G06F 9/3865 (2013.01)] | 24 Claims |
1. A processor comprising:
a decoder to decode a matrix multiplication instruction having a first field associated with a first source matrix, a second field associated with a second source matrix, a destination field, and an opcode to identify the matrix multiplication instruction; and
an execution unit coupled to the decoder, the execution unit to perform operations in response to the matrix multiplication instruction, the operations including:
partitioning the first source matrix into a first plurality of tiles, each tile in the first plurality of tiles comprising a specified number of non-overlapping data elements, and
partitioning the second source matrix into a second plurality of tiles, each tile in the second plurality of tiles comprising a specified number of non-overlapping data elements,
the execution unit comprising a fused matrix multiplication and addition logic to perform parallel fused multiply-accumulate operations using data elements from a first tile of the first plurality of tiles and data elements from a second tile of the second plurality of tiles,
at least one of the parallel fused multiply-accumulate operations is to:
multiply data elements from the first tile and corresponding data elements from the second tile to generate a plurality of products, and add one or more of the plurality of products to a corresponding data element from an accumulation matrix to generate a corresponding result value in a result matrix.
|