US 12,314,717 B2
Systems, methods, and apparatuses for dot production operations
Robert Valentine, Kiryat Tivon (IL); Dan Baum, Haifa (IL); Zeev Sperber, Zichron Yaakov (IL); Jesus Corbal, Barcelona (ES); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Bret L. Toll, Hillsboro, OR (US); Mark J. Charney, Lexington, MA (US); Menachem Adelman, Haifa (IL); Barukh Ziv, Haifa (IL); Alexander Heinecke, San Jose, CA (US); and Simon Rubanovich, Haifa (IL)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Nov. 1, 2021, as Appl. No. 17/516,023.
Application 17/516,023 is a continuation of application No. 16/486,960, granted, now 11,163,565, previously published as PCT/US2017/040534, filed on Jul. 1, 2017.
Claims priority of provisional application 62/473,732, filed on Mar. 20, 2017.
Prior Publication US 2022/0058021 A1, Feb. 24, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/30 (2018.01); G06F 7/485 (2006.01); G06F 7/487 (2006.01); G06F 7/76 (2006.01); G06F 9/38 (2018.01); G06F 17/16 (2006.01)
CPC G06F 9/30036 (2013.01) [G06F 7/485 (2013.01); G06F 7/4876 (2013.01); G06F 7/762 (2013.01); G06F 9/3001 (2013.01); G06F 9/30032 (2013.01); G06F 9/30043 (2013.01); G06F 9/30109 (2013.01); G06F 9/30112 (2013.01); G06F 9/30134 (2013.01); G06F 9/30145 (2013.01); G06F 9/30149 (2013.01); G06F 9/3016 (2013.01); G06F 9/30185 (2013.01); G06F 9/30196 (2013.01); G06F 9/3818 (2013.01); G06F 9/3836 (2013.01); G06F 17/16 (2013.01); G06F 2212/454 (2013.01)] 30 Claims
OG exemplary drawing
 
1. An apparatus comprising:
a first one or more registers to store a first M row by K column (M×K) matrix,
a second one or more registers to store a second K row by N column (K×N) matrix,
a third one or more registers to store a third M row by N column (M×N) matrix,
decode circuitry to decode a single instruction, the single instruction having fields to indicate the first (M×K) matrix as a first operand for execution of the single instruction, the second (K×N) matrix as a second operand for execution of the single instruction, and the third (M×N) matrix as a third operand for execution of the single instruction, the first M×K matrix having data elements of a first size, the second K×N matrix having data elements of the first size, and the third M×N matrix having data elements of a second size four times the first size; and
execution circuitry, coupled with the decode circuitry, the execution circuitry to perform operations corresponding to the instruction, including to:
for each row of the first M×K matrix, and each column of the second K×N matrix:
generate a dot-product from all data elements of the row of the first M×K matrix and all data elements of the column of the second K×N matrix; and
accumulate the dot-product with a data element from a corresponding row and a corresponding column of the third M×N matrix.