| CPC G06F 17/16 (2013.01) [G06F 9/3001 (2013.01); G06F 9/30101 (2013.01); G06F 9/3016 (2013.01); G06F 9/3802 (2013.01); G06F 9/3836 (2013.01); G06F 9/3893 (2013.01)] | 20 Claims |

|
1. An apparatus comprising:
a cache; and
a graphics processing unit coupled to the cache, wherein the graphics processing unit comprises:
a first register to store elements from a first matrix that has zero and non-zero values in a compressed format,
a second register to store elements from a second matrix,
a scheduler circuit to schedule an instruction for execution, the instruction comprising fields to specify the first register, the second register, an accumulation matrix, a destination matrix, indications of a logical matrix position of the elements in at least the first matrix in a non-compressed format, and an opcode to indicate the instruction is a sparse matrix instruction and that execution circuitry including a processing engine is to select a proper subset of elements of the second register from the second matrix as an input into a multiply-accumulator circuit of the processing engine based on the indications, multiply the elements from the first matrix with corresponding elements of the proper subset of elements of the second matrix to generate products, accumulate the products with corresponding elements of the accumulation matrix to produce sums, and store the sums in corresponding elements of the destination matrix, and
the execution circuitry, including the processing engine, to execute the instruction according to the opcode,
wherein the first matrix has M rows by K columns, the second matrix has K rows by N columns, the accumulation matrix has M rows by N columns, and the instruction includes a suffix to the opcode that when set to a first value is to explicitly specify a first set of K, M, and N values, and when set to a different second value is to explicitly specify a different second set of K, M, and N values.
|