US 11,709,793 B2
	Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
Subramaniam Maiyuran, Gold River, CA (US); Shubra Marwaha, Folsom, CA (US); Ashutosh Garg, Folsom, CA (US); Supratim Pal, Bangalore (IN); Jorge Parra, El Dorado Hills, CA (US); Chandra Gurram, Folsom, CA (US); Varghese George, Folsom, CA (US); Darin Starkey, Roseville, CA (US); and Guei-Yuan Lueh, San Jose, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on May 27, 2022, as Appl. No. 17/827,067.
Application 17/827,067 is a continuation of application No. 17/304,092, filed on Jun. 14, 2021, granted, now 11,361,496.
Application 17/304,092 is a continuation of application No. PCT/US2020/022852, filed on Mar. 14, 2020.
Claims priority of provisional application 62/819,361, filed on Mar. 15, 2019.
Claims priority of provisional application 62/819,435, filed on Mar. 15, 2019.
Claims priority of provisional application 62/819,337, filed on Mar. 15, 2019.
Prior Publication US 2022/0365901 A1, Nov. 17, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 15/06 (2011.01); G06F 9/30 (2018.01); G06F 15/78 (2006.01); G06F 9/38 (2018.01); G06F 17/18 (2006.01); G06F 12/0802 (2016.01); G06F 7/544 (2006.01); G06F 7/575 (2006.01); G06F 12/02 (2006.01); G06F 12/0866 (2016.01); G06F 12/0875 (2016.01); G06F 12/0895 (2016.01); G06F 12/128 (2016.01); G06F 12/06 (2006.01); G06F 12/1009 (2016.01); G06T 1/20 (2006.01); G06T 1/60 (2006.01); H03M 7/46 (2006.01); G06F 12/0811 (2016.01); G06F 15/80 (2006.01); G06F 17/16 (2006.01); G06F 7/58 (2006.01); G06F 12/0871 (2016.01); G06F 12/0862 (2016.01); G06F 12/0897 (2016.01); G06F 9/50 (2006.01); G06F 12/0804 (2016.01); G06F 12/0882 (2016.01); G06F 12/0891 (2016.01); G06F 12/0893 (2016.01); G06F 12/0888 (2016.01); G06N 3/08 (2023.01)

CPC G06F 15/7839 (2013.01) [G06F 7/5443 (2013.01); G06F 7/575 (2013.01); G06F 7/588 (2013.01); G06F 9/3001 (2013.01); G06F 9/3004 (2013.01); G06F 9/30014 (2013.01); G06F 9/30036 (2013.01); G06F 9/30043 (2013.01); G06F 9/30047 (2013.01); G06F 9/30065 (2013.01); G06F 9/30079 (2013.01); G06F 9/3887 (2013.01); G06F 9/5011 (2013.01); G06F 9/5077 (2013.01); G06F 12/0215 (2013.01); G06F 12/0238 (2013.01); G06F 12/0246 (2013.01); G06F 12/0607 (2013.01); G06F 12/0802 (2013.01); G06F 12/0804 (2013.01); G06F 12/0811 (2013.01); G06F 12/0862 (2013.01); G06F 12/0866 (2013.01); G06F 12/0871 (2013.01); G06F 12/0875 (2013.01); G06F 12/0882 (2013.01); G06F 12/0888 (2013.01); G06F 12/0891 (2013.01); G06F 12/0893 (2013.01); G06F 12/0895 (2013.01); G06F 12/0897 (2013.01); G06F 12/1009 (2013.01); G06F 12/128 (2013.01); G06F 15/8046 (2013.01); G06F 17/16 (2013.01); G06F 17/18 (2013.01); G06T 1/20 (2013.01); G06T 1/60 (2013.01); H03M 7/46 (2013.01); G06F 9/3802 (2013.01); G06F 9/3818 (2013.01); G06F 9/3867 (2013.01); G06F 2212/1008 (2013.01); G06F 2212/1021 (2013.01); G06F 2212/1044 (2013.01); G06F 2212/302 (2013.01); G06F 2212/401 (2013.01); G06F 2212/455 (2013.01); G06F 2212/60 (2013.01); G06N 3/08 (2013.01); G06T 15/06 (2013.01)]

16 Claims

1. A graphics processor comprising:

a first processing cluster including a plurality of processing resources to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation, the plurality of processing resources interconnected via a data interconnect and include a ray tracing circuit to perform the ray tracing operation and a first matrix processing circuit to perform the matrix multiply operation; and

a second processing cluster coupled to the first processing cluster, wherein the second processing cluster includes a second matrix processing circuit including a floating-point unit to perform floating point operations associated with the matrix multiply operation, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.