US 12,314,727 B2
Optimized compute hardware for machine learning operations
Dipankar Das, Pune (IN); Roger Gramunt, Portland, OR (US); Mikhail Smelyanskiy, Burlingame, CA (US); Jesus Corbal, King City, OR (US); Dheevatsa Mudigere, Bangalore (IN); Naveen K. Mellempudi, Bangalore (IN); and Alexander F. Heinecke, San Jose, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on May 12, 2022, as Appl. No. 17/742,581.
Application 17/742,581 is a continuation of application No. 16/983,107, filed on Aug. 3, 2020, granted, now 11,334,796.
Application 16/983,107 is a continuation of application No. 15/869,564, filed on Jan. 12, 2018, granted, now 10,776,699, issued on Sep. 15, 2020.
Claims priority of application No. 201741015868 (IN), filed on May 5, 2017.
Prior Publication US 2022/0343174 A1, Oct. 27, 2022
Int. Cl. G06F 9/38 (2018.01); G06F 7/544 (2006.01); G06F 9/30 (2018.01); G06F 17/16 (2006.01)
CPC G06F 9/3888 (2023.08) [G06F 7/5443 (2013.01); G06F 9/30014 (2013.01); G06F 17/16 (2013.01); G06F 2207/382 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A graphics processor comprising:
an instruction cache configured to store an instruction, wherein the instruction is a first multiply-add instruction associated with 8-bit integer input or a second multiply-add instruction associated with 16-bit integer input;
a processing resource having a single instruction multiple thread (SIMT) architecture, the processing resource configured to execute the instruction, the processing resource comprising:
a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths;
an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths; and
circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.