US 12,141,578 B2
Instructions and logic to perform floating point and integer operations for machine learning
Himanshu Kaul, Portland, OR (US); Mark A. Anders, Hillsboro, OR (US); Sanu K. Mathew, Hillsboro, OR (US); Anbang Yao, Beijing (CN); Joydeep Ray, Folsom, CA (US); Ping T. Tang, Edison, NJ (US); Michael S. Strickland, Sunnyvale, CA (US); Xiaoming Chen, Shanghai (CN); Tatiana Shpeisman, Menlo Park, CA (US); Abhishek R. Appu, El Dorado Hills, CA (US); Altug Koker, El Dorado Hills, CA (US); Kamal Sinha, Rancho Cordova, CA (US); Balaji Vembu, Folsom, CA (US); Nicolas C. Galoppo Von Borries, Portland, OR (US); Eriko Nurvitadhi, Hillsboro, OR (US); Rajkishore Barik, Santa Clara, CA (US); Tsung-Han Lin, Campbell, CA (US); Vasanth Ranganathan, El Dorado Hills, CA (US); and Sanjeev Jahagirdar, Folsom, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Dec. 9, 2020, as Appl. No. 17/115,989.
Application 17/115,989 is a continuation of application No. 16/432,402, filed on Jun. 5, 2019, granted, now 11,169,799.
Application 16/432,402 is a continuation of application No. 15/819,152, filed on Nov. 21, 2017, granted, now 10,353,706, issued on Jul. 16, 2019.
Application 15/819,152 is a continuation of application No. 15/787,129, filed on Oct. 18, 2017, granted, now 10,474,458, issued on Nov. 12, 2019.
Claims priority of provisional application 62/491,699, filed on Apr. 28, 2017.
Prior Publication US 2021/0124579 A1, Apr. 29, 2021
Int. Cl. G06F 9/30 (2018.01); G06F 7/483 (2006.01); G06F 7/544 (2006.01); G06F 9/38 (2018.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01); G09G 5/393 (2006.01); G06F 17/16 (2006.01); G06N 20/00 (2019.01); G06T 15/00 (2011.01)
CPC G06F 9/3001 (2013.01) [G06F 7/483 (2013.01); G06F 7/5443 (2013.01); G06F 9/30014 (2013.01); G06F 9/30036 (2013.01); G06F 9/3851 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/08 (2013.01); G09G 5/393 (2013.01); G06F 9/30025 (2013.01); G06F 9/3013 (2013.01); G06F 17/16 (2013.01); G06F 2207/3824 (2013.01); G06N 20/00 (2019.01); G06T 15/005 (2013.01)] 28 Claims
OG exemplary drawing
 
1. An apparatus comprising:
an interconnect fabric;
a memory interface coupled to the interconnect fabric;
an input/output, IO, unit coupled to the interconnect fabric;
an array of multiprocessors coupled to the interconnect fabric, a multiprocessor in the array of multiprocessors to execute a mixed-precision instruction in parallel across multiple threads; and
virtualization circuitry to share the array of multiprocessors with a plurality of virtual machines, each virtual machine of the plurality of virtual machines having a dedicated slice of resources provided by the array of multiprocessors, the dedicated slice of resources including the multiprocessor in the array of multiprocessors, the multiprocessor comprising:
a plurality of registers to store packed floating-point operand values; and
execution circuitry to execute one or more of the mixed-precision instructions to perform a fused multiply-accumulate operation, the execution circuitry comprising:
a 16-bit multiplier to multiply a first 16-bit floating point source value and a second 16-bit floating point source value to generate an intermediate result; and
a 32-bit accumulator to add the intermediate result to an accumulated floating point value to generate a new accumulation result.