US 12,073,214 B2
Systems, apparatuses, and methods for chained fused multiply add
Jesus Corbal, King City, OR (US); Robert Valentine, Kiryat Tivon (IL); Roman S. Dubtsov, Novosibirsk (RU); Nikita A. Shustrov, Novosibirsk (RU); Mark J. Charney, Lexington, MA (US); Dennis R. Bradford, Portland, OR (US); Milind B. Girkar, Sunnyvale, CA (US); Edward T. Grochowski, San Jose, CA (US); Thomas D. Fletcher, Sherwood, OR (US); and Warren E. Ferguson, Beaverton, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Sep. 23, 2022, as Appl. No. 17/952,001.
Application 17/952,001 is a continuation of application No. 17/107,134, filed on Nov. 30, 2020, granted, now 11,487,541.
Application 17/107,134 is a continuation of application No. 16/169,456, filed on Oct. 24, 2018, granted, now 10,853,065, issued on Dec. 1, 2020.
Application 16/169,456 is a continuation of application No. 15/299,420, filed on Oct. 20, 2016, granted, now 10,146,535, issued on Dec. 4, 2018.
Prior Publication US 2023/0083705 A1, Mar. 16, 2023
Int. Cl. G06F 9/30 (2018.01); G06F 7/483 (2006.01); G06F 7/544 (2006.01); G06F 9/38 (2018.01)
CPC G06F 9/3001 (2013.01) [G06F 7/483 (2013.01); G06F 7/5443 (2013.01); G06F 9/30036 (2013.01); G06F 9/30109 (2013.01); G06F 9/30112 (2013.01); G06F 9/3893 (2013.01)] 23 Claims
OG exemplary drawing
 
1. An apparatus comprising:
a memory controller;
an interconnect fabric coupled to the memory controller; and
a plurality of data processing circuits coupled to the interconnect fabric, the plurality of data processing circuits to perform operations, at least one data processing circuit comprising:
operand storage to store a first plurality of source data elements at a first precision and a second plurality of source data elements at a second precision;
execution circuitry comprising a plurality of multiply-accumulate circuits to execute a plurality of fused multiply-accumulate (FMA) instructions to perform FMA operations using at least a subset of the first plurality of source data elements and the second plurality of source data elements to generate a plurality of result data elements at the second precision, a multiply-accumulate circuit of the plurality of multiply-accumulate circuits comprising:
multiplier circuitry to multiply each source data element of a respective subset of source data elements of the first plurality of source data elements and a corresponding source data element of a respective subset of source data elements of the second plurality of source data elements to generate a plurality of products; and
adder circuitry to add the plurality of products and an initial accumulation value, if provided, or an accumulation value from a prior multiply-accumulate circuit in a chain of multiply-accumulate circuits to generate a new accumulation value,
wherein the new accumulation value is a result accumulation value if the multiply-accumulate circuit is a last multiply-accumulate circuit in the chain of multiply-accumulate circuits or the new accumulation value is to be provided to a next multiply-accumulate circuit in the chain.