US 12,141,094 B2
Systolic disaggregation within a matrix accelerator architecture
Prasoonkumar Surti, Folsom, CA (US); Subramaniam Maiyuran, Gold River, CA (US); Valentin Andrei, San Jose, CA (US); Abhishek Appu, El Dorado Hills, CA (US); Varghese George, Folsom, CA (US); Altug Koker, El Dorado Hills, CA (US); Mike Macpherson, Portland, OR (US); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Vasanth Ranganathan, El Dorado Hills, CA (US); Joydeep Ray, Folsom, CA (US); Lakshminarayanan Striramassarma, Folsom, CA (US); and SungYe Kim, Folsom, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Appl. No. 17/428,233
Filed by INTEL CORPORATION, Santa Clara, CA (US)
PCT Filed Mar. 14, 2020, PCT No. PCT/US2020/022845
§ 371(c)(1), (2) Date Aug. 3, 2021,
PCT Pub. No. WO2020/190807, PCT Pub. Date Sep. 24, 2020.
Claims priority of provisional application 62/819,337, filed on Mar. 15, 2019.
Claims priority of provisional application 62/819,361, filed on Mar. 15, 2019.
Claims priority of provisional application 62/819,435, filed on Mar. 15, 2019.
Prior Publication US 2022/0129521 A1, Apr. 28, 2022
Int. Cl. G06F 9/30 (2018.01); G06F 7/544 (2006.01); G06F 7/575 (2006.01); G06F 7/58 (2006.01); G06F 9/38 (2018.01); G06F 9/50 (2006.01); G06F 12/02 (2006.01); G06F 12/06 (2006.01); G06F 12/0802 (2016.01); G06F 12/0804 (2016.01); G06F 12/0811 (2016.01); G06F 12/0862 (2016.01); G06F 12/0866 (2016.01); G06F 12/0871 (2016.01); G06F 12/0875 (2016.01); G06F 12/0882 (2016.01); G06F 12/0888 (2016.01); G06F 12/0891 (2016.01); G06F 12/0893 (2016.01); G06F 12/0895 (2016.01); G06F 12/0897 (2016.01); G06F 12/1009 (2016.01); G06F 12/128 (2016.01); G06F 15/78 (2006.01); G06F 15/80 (2006.01); G06F 17/16 (2006.01); G06F 17/18 (2006.01); G06T 1/20 (2006.01); G06T 1/60 (2006.01); H03M 7/46 (2006.01); G06N 3/08 (2023.01); G06T 15/06 (2011.01)
CPC G06F 15/7839 (2013.01) [G06F 7/5443 (2013.01); G06F 7/575 (2013.01); G06F 7/588 (2013.01); G06F 9/3001 (2013.01); G06F 9/30014 (2013.01); G06F 9/30036 (2013.01); G06F 9/3004 (2013.01); G06F 9/30043 (2013.01); G06F 9/30047 (2013.01); G06F 9/30065 (2013.01); G06F 9/30079 (2013.01); G06F 9/3887 (2013.01); G06F 9/5011 (2013.01); G06F 9/5077 (2013.01); G06F 12/0215 (2013.01); G06F 12/0238 (2013.01); G06F 12/0246 (2013.01); G06F 12/0607 (2013.01); G06F 12/0802 (2013.01); G06F 12/0804 (2013.01); G06F 12/0811 (2013.01); G06F 12/0862 (2013.01); G06F 12/0866 (2013.01); G06F 12/0871 (2013.01); G06F 12/0875 (2013.01); G06F 12/0882 (2013.01); G06F 12/0888 (2013.01); G06F 12/0891 (2013.01); G06F 12/0893 (2013.01); G06F 12/0895 (2013.01); G06F 12/0897 (2013.01); G06F 12/1009 (2013.01); G06F 12/128 (2013.01); G06F 15/8046 (2013.01); G06F 17/16 (2013.01); G06F 17/18 (2013.01); G06T 1/20 (2013.01); G06T 1/60 (2013.01); H03M 7/46 (2013.01); G06F 9/3802 (2013.01); G06F 9/3818 (2013.01); G06F 9/3867 (2013.01); G06F 2212/1008 (2013.01); G06F 2212/1021 (2013.01); G06F 2212/1044 (2013.01); G06F 2212/302 (2013.01); G06F 2212/401 (2013.01); G06F 2212/455 (2013.01); G06F 2212/60 (2013.01); G06N 3/08 (2013.01); G06T 15/06 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A general-purpose graphics processing unit comprising:
a matrix accelerator including:
memory to store input data;
a systolic array coupled with the memory, the systolic array including multiple stages, wherein each of the multiple stages include multiple processing elements; and
circuitry to bypass a matrix multiply operation having zero-value inputs, the bypass performed based on metadata associated with the inputs, wherein each of the multiple processing elements include hardware logic to detect a zero-value input and bypass a matrix multiply operation based on the zero-value input.