US 12,204,898 B2
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems
Edward T. Grochowski, San Jose, CA (US); Asit K. Mishra, Hillsboro, OR (US); Robert Valentine, Kiryat Tivon (IL); Mark J. Charney, Lexington, MA (US); and Simon C. Steely, Jr., Hudson, NH (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Aug. 30, 2023, as Appl. No. 18/240,287.
Application 18/240,287 is a continuation of application No. 18/220,225, filed on Jul. 10, 2023, granted, now 12,050,912.
Application 18/220,225 is a continuation of application No. 17/362,854, filed on Jun. 29, 2021, granted, now 11,698,787, issued on Jul. 11, 2023.
Application 17/362,854 is a continuation of application No. 16/398,200, filed on Apr. 29, 2019, granted, now 11,048,508, issued on Jun. 29, 2021.
Application 16/398,200 is a continuation of application No. 15/201,442, filed on Jul. 2, 2016, granted, now 10,275,243, issued on Apr. 30, 2019.
Prior Publication US 2023/0409318 A1, Dec. 21, 2023
Int. Cl. G06F 9/30 (2018.01); G06F 9/38 (2018.01)
CPC G06F 9/3001 (2013.01) [G06F 9/30036 (2013.01); G06F 9/30038 (2023.08); G06F 9/30145 (2013.01); G06F 9/3861 (2013.01); G06F 9/3865 (2013.01)] 24 Claims
OG exemplary drawing
 
1. A processor comprising:
a decoder to decode a matrix multiplication instruction having a first field associated with a first source matrix, a second field associated with a second source matrix, a destination field, and an opcode to identify the matrix multiplication instruction; and
an execution unit coupled to the decoder, the execution unit to perform operations in response to the matrix multiplication instruction, the operations including:
partitioning the first source matrix into a first plurality of tiles, each tile in the first plurality of tiles comprising a specified number of non-overlapping data elements, and
partitioning the second source matrix into a second plurality of tiles, each tile in the second plurality of tiles comprising a specified number of non-overlapping data elements,
the execution unit comprising a fused matrix multiplication and addition logic to perform parallel fused multiply-accumulate operations using data elements from a first tile of the first plurality of tiles and data elements from a second tile of the second plurality of tiles,
at least one of the parallel fused multiply-accumulate operations is to:
multiply data elements from the first tile and corresponding data elements from the second tile to generate a plurality of products, and add one or more of the plurality of products to a corresponding data element from an accumulation matrix to generate a corresponding result value in a result matrix.