US 12,229,554 B2
BFLOAT16 fused multiply instructions
Alexander Heinecke, San Jose, CA (US); Menachem Adelman, Haifa (IL); Robert Valentine, Kiryat Tivon (IL); Zeev Sperber, Zikhron Yaakov (IL); Amit Gradstein, Binyamina (IL); Mark Charney, Lexington, MA (US); Evangelos Georganas, San Mateo, CA (US); Dhiraj Kalamkar, Bangalore (IN); Christopher Hughes, Santa Clara, CA (US); and Cristina Anderson, Hillsboro, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Aug. 31, 2021, as Appl. No. 17/463,405.
Prior Publication US 2023/0067810 A1, Mar. 2, 2023
Int. Cl. G06F 9/30 (2018.01); G06F 7/544 (2006.01)
CPC G06F 9/30014 (2013.01) [G06F 7/5443 (2013.01); G06F 9/30036 (2013.01); G06F 9/30038 (2023.08); G06F 9/30145 (2013.01)] 30 Claims
OG exemplary drawing
 
1. An apparatus comprising:
decoder circuitry to decode a single instruction, the single instruction to include fields for an opcode, an identification of location of a packed data source/destination operand (a first source), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a BF16 value fused multiply-accumulate operation using the first, second, and third source operands and store a result in a corresponding data element position of the source/destination operand; and
execution circuitry to execute the decoded single instruction according to the opcode.