US 12,204,903 B2
Dual sum of quadword 16×16 multiply and accumulate
Venkateswara Madduri, Austin, TX (US); Cristina Anderson, Hillsboro, OR (US); Robert Valentine, Kiryat Tivon (IL); Mark Charney, Lexington, MA (US); and Vedvyas Shanbhogue, Austin, TX (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jun. 26, 2021, as Appl. No. 17/359,522.
Claims priority of provisional application 63/215,433, filed on Jun. 26, 2021.
Prior Publication US 2022/0413861 A1, Dec. 29, 2022
Int. Cl. G06F 9/30 (2018.01)
CPC G06F 9/30145 (2013.01) [G06F 9/3001 (2013.01); G06F 9/30036 (2013.01); G06F 9/30038 (2023.08); G06F 9/30101 (2013.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus comprising:
decoder circuitry to decode a single instruction, wherein the single instruction is to include fields for an opcode, one or more fields to indicate a location of a source/destination operand, one or more fields to indicate a location of a first source operand, and one or more fields to indicate a location of a second source operand, wherein the opcode is to indicate that execution circuitry is to:
multiply values from corresponding data elements of the first and second sources,
add a first subset of the multiplied values to a first value from the source/destination operand and store in a first data element position of the source/destination operand, and
add a second subset of the multiplied values to a second value from the source/destination operand and store in a second data element position of the source/destination operand; and
execution circuitry to execute the decoded instruction according to the opcode.