US 11,886,875 B2
	Systems and methods for performing nibble-sized operations on matrix elements
Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Jonathan D. Pearce, Hillsboro, OR (US); Dan Baum, Haifa (IL); Guei-Yuan Lueh, San Jose, CA (US); Michael Espig, Newberg, OR (US); Christopher J. Hughes, Santa Clara, CA (US); Raanan Sade, Kibutz Sarid (IL); Robert Valentine, Kiryat Tivon (IL); Mark J. Charney, Lexington, MA (US); and Alexander F. Heinecke, San Jose, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Dec. 26, 2018, as Appl. No. 16/232,599.
Prior Publication US 2020/0210173 A1, Jul. 2, 2020
Int. Cl. G06F 9/30 (2018.01)

CPC G06F 9/30036 (2013.01) [G06F 9/3001 (2013.01); G06F 9/30018 (2013.01); G06F 9/30038 (2023.08)]

19 Claims

1. A processor comprising:

fetch circuitry to fetch a single instruction;

decode circuitry to decode the fetched single instruction having fields to specify an opcode, locations of first source, second source, and destination matrices that are each a single two-dimensional tile register in a matrix operations accelerator of the processor, and that an element size of each element of the first source, the second source, and the destination matrices is larger than a nibble, the opcode to indicate execution circuitry is to cause a grid of fused multiply and accumulate circuits of the matrix operations accelerator to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the single instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix; and

the execution circuitry to execute the decoded single instruction as per the opcode.