US 11,954,063 B2
	Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
Subramaniam Maiyuran, Gold River, CA (US); Shubra Marwaha, Folsom, CA (US); Ashutosh Garg, Folsom, CA (US); Supratim Pal, Bangalore (IN); Jorge Parra, El Dorado Hills, CA (US); Chandra Gurram, Folsom, CA (US); Varghese George, Folsom, CA (US); Darin Starkey, Roseville, CA (US); and Guei-Yuan Lueh, San Jose, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Feb. 17, 2023, as Appl. No. 18/170,900.
Application 18/170,900 is a continuation of application No. 17/827,067, filed on May 27, 2022, granted, now 11,709,793.
Application 17/827,067 is a continuation of application No. 17/304,092, filed on Jun. 14, 2021, granted, now 11,361,496, issued on Jun. 14, 2022.
Application 17/304,092 is a continuation of application No. PCT/US2020/022852, filed on Mar. 14, 2020.
Claims priority of provisional application 62/819,361, filed on Mar. 15, 2019.
Claims priority of provisional application 62/819,435, filed on Mar. 15, 2019.
Claims priority of provisional application 62/819,337, filed on Mar. 15, 2019.
Prior Publication US 2023/0195685 A1, Jun. 22, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 15/06 (2011.01); G06F 7/544 (2006.01); G06F 7/575 (2006.01); G06F 7/58 (2006.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06F 9/50 (2006.01); G06F 12/02 (2006.01); G06F 12/06 (2006.01); G06F 12/0802 (2016.01); G06F 12/0804 (2016.01); G06F 12/0811 (2016.01); G06F 12/0862 (2016.01); G06F 12/0866 (2016.01); G06F 12/0871 (2016.01); G06F 12/0875 (2016.01); G06F 12/0882 (2016.01); G06F 12/0888 (2016.01); G06F 12/0891 (2016.01); G06F 12/0893 (2016.01); G06F 12/0895 (2016.01); G06F 12/0897 (2016.01); G06F 12/1009 (2016.01); G06F 12/128 (2016.01); G06F 15/78 (2006.01); G06F 15/80 (2006.01); G06F 17/16 (2006.01); G06F 17/18 (2006.01); G06T 1/20 (2006.01); G06T 1/60 (2006.01); H03M 7/46 (2006.01); G06N 3/08 (2023.01)

CPC G06F 15/7839 (2013.01) [G06F 7/5443 (2013.01); G06F 7/575 (2013.01); G06F 7/588 (2013.01); G06F 9/3001 (2013.01); G06F 9/30014 (2013.01); G06F 9/30036 (2013.01); G06F 9/3004 (2013.01); G06F 9/30043 (2013.01); G06F 9/30047 (2013.01); G06F 9/30065 (2013.01); G06F 9/30079 (2013.01); G06F 9/3887 (2013.01); G06F 9/5011 (2013.01); G06F 9/5077 (2013.01); G06F 12/0215 (2013.01); G06F 12/0238 (2013.01); G06F 12/0246 (2013.01); G06F 12/0607 (2013.01); G06F 12/0802 (2013.01); G06F 12/0804 (2013.01); G06F 12/0811 (2013.01); G06F 12/0862 (2013.01); G06F 12/0866 (2013.01); G06F 12/0871 (2013.01); G06F 12/0875 (2013.01); G06F 12/0882 (2013.01); G06F 12/0888 (2013.01); G06F 12/0891 (2013.01); G06F 12/0893 (2013.01); G06F 12/0895 (2013.01); G06F 12/0897 (2013.01); G06F 12/1009 (2013.01); G06F 12/128 (2013.01); G06F 15/8046 (2013.01); G06F 17/16 (2013.01); G06F 17/18 (2013.01); G06T 1/20 (2013.01); G06T 1/60 (2013.01); H03M 7/46 (2013.01); G06F 9/3802 (2013.01); G06F 9/3818 (2013.01); G06F 9/3867 (2013.01); G06F 2212/1008 (2013.01); G06F 2212/1021 (2013.01); G06F 2212/1044 (2013.01); G06F 2212/302 (2013.01); G06F 2212/401 (2013.01); G06F 2212/455 (2013.01); G06F 2212/60 (2013.01); G06N 3/08 (2013.01); G06T 15/06 (2013.01)]

23 Claims

1. A graphics processing unit (GPU) comprising:

a single instruction, multiple thread (SIMT) multiprocessor comprising:

an instruction cache;

a shared memory coupled with the instruction cache;

circuitry coupled with the shared memory and the instruction cache, the circuitry including:

multiple texture units;

a first core including hardware to accelerate matrix operations; and

a second core configured to:

receive an instruction having multiple operands, wherein the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent; and

process the instruction using the multiple operands, wherein to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.