US 12,217,053 B2
Instructions and logic to perform floating point and integer operations for machine learning
Himanshu Kaul, Portland, OR (US); Mark A. Anders, Hillsboro, OR (US); Sanu K. Mathew, Hillsboro, OR (US); Anbang Yao, Beijing (CN); Joydeep Ray, Folsom, CA (US); Ping T. Tang, Edison, NJ (US); Michael S. Strickland, Sunnyvale, CA (US); Xiaoming Chen, Shanghai (CN); Tatiana Shpeisman, Menlo Park, CA (US); Abhishek R. Appu, El Dorado Hills, CA (US); Altug Koker, El Dorado Hills, CA (US); Kamal Sinha, Rancho Cordova, CA (US); Balaji Vembu, Folsom, CA (US); Nicolas C. Galoppo Von Borries, Portland, OR (US); Eriko Nurvitadhi, Hillsboro, OR (US); Rajkishore Barik, Santa Clara, CA (US); Tsung-Han Lin, Campbell, CA (US); Vasanth Ranganathan, El Dorado Hills, CA (US); and Sanjeev Jahagirdar, Folsom, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Dec. 4, 2023, as Appl. No. 18/528,340.
Application 18/528,340 is a continuation of application No. 17/967,283, filed on Oct. 17, 2022.
Application 17/967,283 is a continuation of application No. 17/834,482, filed on Jun. 7, 2022, granted, now 11,720,355, issued on Aug. 8, 2023.
Application 17/834,482 is a continuation of application No. 17/305,355, filed on Jul. 6, 2021, granted, now 11,360,767, issued on Jun. 14, 2022.
Application 17/305,355 is a continuation of application No. 17/169,232, filed on Feb. 5, 2021, granted, now 11,080,046, issued on Aug. 3, 2021.
Application 17/169,232 is a continuation of application No. 17/115,989, filed on Dec. 9, 2020.
Application 17/115,989 is a continuation of application No. 16/432,402, filed on Jun. 5, 2019, granted, now 11,169,799, issued on Nov. 9, 2021.
Application 16/432,402 is a continuation of application No. 15/819,152, filed on Nov. 21, 2017, granted, now 10,353,706, issued on Jul. 16, 2019.
Application 15/819,152 is a continuation of application No. 15/787,129, filed on Oct. 18, 2017, granted, now 10,474,458, issued on Nov. 12, 2019.
Claims priority of provisional application 62/491,699, filed on Apr. 28, 2017.
Prior Publication US 2024/0184572 A1, Jun. 6, 2024
Int. Cl. G06F 9/30 (2018.01); G06F 7/483 (2006.01); G06F 7/544 (2006.01); G06F 9/38 (2018.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01); G09G 5/393 (2006.01); G06F 1/16 (2006.01); G06F 17/16 (2006.01); G06N 20/00 (2019.01); G06T 15/00 (2011.01)
CPC G06F 9/3001 (2013.01) [G06F 7/483 (2013.01); G06F 7/5443 (2013.01); G06F 9/30014 (2013.01); G06F 9/30036 (2013.01); G06F 9/3851 (2013.01); G06F 9/3888 (2023.08); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/08 (2013.01); G09G 5/393 (2013.01); G06F 1/16 (2013.01); G06F 9/30025 (2013.01); G06F 9/3013 (2013.01); G06F 17/16 (2013.01); G06F 2207/3824 (2013.01); G06N 20/00 (2019.01); G06T 15/005 (2013.01)] 25 Claims
OG exemplary drawing
 
1. An apparatus comprising:
an array of processing units to process instructions;
a plurality of memory interfaces, each memory interface to couple one or more of the processing units to a corresponding memory;
each processing unit in the array of processing units comprising:
execution circuitry to execute a floating-point multiply-accumulate instruction, the execution circuitry comprising:
mantissa multiplication circuitry, wherein the mantissa multiplication circuitry is shared with an integer datapath of the execution circuitry, wherein responsive to the floating-point multiply-accumulate instruction, the mantissa multiplication circuitry is to perform a multiplication operation with a mantissa value of each 16-bit floating-point data element of a first plurality of 16-bit floating-point data elements and a mantissa value of a corresponding 16-bit floating-point data element of a second plurality of 16-bit floating-point data elements to generate a corresponding plurality of mantissa results;
exponent processing circuitry, responsive to the floating-point multiply-accumulate instruction, to perform an operation with an exponent value of each 16-bit floating-point data element of the first plurality of 16-bit floating-point data elements and an exponent value of each corresponding 16-bit floating-point data element of the second plurality of 16-bit floating-point data elements to generate a corresponding plurality of exponent results;
circuitry to process the plurality of mantissa results and the plurality of exponent results to generate a corresponding floating-point product; and
adder circuitry to generate a plurality of result floating-point values, each result floating-point value comprising a sum of one or more floating-point products of a plurality of floating-point products and a corresponding accumulated floating-point value of a plurality of accumulated floating-point values.