US 11,782,709 B2
Systems, apparatuses, and methods for addition of partial products
Robert Valentine, Kiryat Tivon (IL); Galina Ryvchin, Haifa (IL); Piotr Majcher, Straszyn (PL); Mark J. Charney, Lexington, MA (US); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Jesus Corbal, King City, OR (US); Milind B. Girkar, Sunnyvale, CA (US); Zeev Sperber, Zichron Yackov (IL); Simon Rubanovich, Haifa (IL); and Amit Gradstein, Binyamina (IL)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Oct. 13, 2022, as Appl. No. 17/964,964.
Application 17/964,964 is a continuation of application No. 17/487,611, filed on Sep. 28, 2021, granted, now 11,544,058.
Application 17/487,611 is a continuation of application No. 16/338,324, granted, now 11,169,802, issued on Nov. 9, 2021, previously published as PCT/US2016/057991, filed on Oct. 20, 2016.
Prior Publication US 2023/0048998 A1, Feb. 16, 2023
Int. Cl. G06F 9/30 (2018.01); G06F 7/544 (2006.01); G06F 9/38 (2018.01)
CPC G06F 9/30014 (2013.01) [G06F 7/5443 (2013.01); G06F 9/30018 (2013.01); G06F 9/30036 (2013.01); G06F 9/30105 (2013.01); G06F 9/3818 (2013.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus comprising:
decode circuitry to decode a single instruction, the single instruction having fields to indicate an opcode, a packed destination operand, a first packed source operand, and a second packed source operand, wherein elements of the destination are 32 bits in size and elements of the first source and the second source are 16 bits in size;
a register file having a plurality of packed data registers including registers for the destination and source operands; and
execution circuitry, coupled to the decode circuitry, the execution circuitry to perform operations corresponding to the instruction, including to, for each element position of the destination:
multiply a first element from the first source and a first element from the second source to generate a first result,
multiply a second element from the first source and a second element from the second source to generate a second result,
add the first result and the second result to generate a third result;
add the third result to an element from the element position of the destination to generate a fourth result, and
store the fourth result in the element position of the destination.