US 11,727,527 B2
Programmable coarse grained and sparse matrix compute hardware with advanced scheduling
Eriko Nurvitadhi, Hillsoboro, OR (US); Balaji Vembu, Folsom, CA (US); Nicolas C. Galoppo Von Borries, Portland, OR (US); Rajkishore Barik, Santa Clara, CA (US); Tsung-Han Lin, Campbell, CA (US); Kamal Sinha, Rancho Cordova, CA (US); Nadathur Rajagopalan Satish, Santa Clara, CA (US); Jeremy Bottleson, Rancho Cordova, CA (US); Farshad Akhbari, Chandler, AZ (US); Altug Koker, El Dorado Hills, CA (US); Narayan Srinivasa, Portland, OR (US); Dukhwan Kim, San Jose, CA (US); Sara S. Baghsorkhi, San Jose, CA (US); Justin E. Gottschlich, Santa Clara, CA (US); Feng Chen, Shanghai (CN); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Kevin Nealis, San Jose, CA (US); Xiaoming Chen, Shanghai (CN); and Anbang Yao, Beijing (CN)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Dec. 3, 2021, as Appl. No. 17/541,413.
Application 17/541,413 is a continuation of application No. 16/928,353, filed on Jul. 14, 2020, granted, now 11,210,760.
Application 16/928,353 is a continuation of application No. 16/197,783, filed on Nov. 21, 2018, granted, now 10,769,748, issued on Sep. 8, 2020.
Application 16/197,783 is a continuation of application No. 15/581,182, filed on Apr. 28, 2017, granted, now 10,186,011, issued on Jan. 22, 2019.
Prior Publication US 2022/0164916 A1, May 26, 2022
Int. Cl. G06T 1/20 (2006.01); G06N 3/063 (2023.01); G06F 9/38 (2018.01); G06F 9/30 (2018.01); G06N 3/084 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01)
CPC G06T 1/20 (2013.01) [G06F 9/3001 (2013.01); G06F 9/3017 (2013.01); G06F 9/3851 (2013.01); G06F 9/3887 (2013.01); G06F 9/3895 (2013.01); G06N 3/04 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/08 (2013.01); G06N 3/084 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A compute apparatus to perform compute operations, the compute apparatus comprising:
a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation including multiple pipeline commands;
a micro-controller to execute firmware instructions, the firmware instructions to enable a parameter analyzer to determine a type of complex compute operation to perform for the single instruction; and
a scheduler controller to schedule the multiple pipeline commands for the complex compute operation to one or more of multiple types of compute units, wherein the multiple types of compute units include a first sparse compute unit configured for input at a first level of sparsity and a second sparse compute unit configured for input at a second level of sparsity that is higher than the first level of sparsity.