US 12,135,981 B2
	Systems, methods, and apparatuses for heterogeneous computing
Rajesh M. Sankaran, Portland, OR (US); Gilbert Neiger, Hillsboro, OR (US); Narayan Ranganathan, Bangalore (IN); Stephen R. Van Doren, Portland, OR (US); Joseph Nuzman, Haifa (IL); Niall D. McDonnell, Limerick (IE); Michael A. O'Hanlon, Limerick (IE); Lokpraveen B. Mosur, Gilbert, AZ (US); Tracy Garrett Drysdale, Paradise Valley, AZ (US); Eriko Nurvitadhi, Hillsboro, OR (US); Asit K. Mishra, Hillsboro, OR (US); Ganesh Venkatesh, Hillsboro, OR (US); Deborah T. Marr, Portland, OR (US); Nicholas P. Carter, Somerville, MA (US); Jonathan D. Pearce, Hillsboro, OR (US); Edward T. Grochowski, San Jose, CA (US); Richard J. Greco, Hillsboro, OR (US); Robert Valentine, Kiryat Tivon (IL); Jesus Corbal, King City, OR (US); Thomas D. Fletcher, Sherwood, OR (US); Dennis R. Bradford, Portland, OR (US); Dwight P. Manley, Holliston, MA (US); Mark J. Charney, Lexington, MA (US); Jeffrey J. Cook, Portland, OR (US); Paul Caprioli, Hillsboro, OR (US); Koichi Yamada, Los Gatos, CA (US); Kent D. Glossop, Merrimack, NH (US); and David B. Sheffield, Hillsboro, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jun. 9, 2023, as Appl. No. 18/207,870.
Application 18/207,870 is a continuation of application No. 17/381,521, filed on Jul. 21, 2021, granted, now 11,693,691.
Application 17/381,521 is a continuation of application No. 16/913,265, filed on Jun. 26, 2020, granted, now 11,093,277, issued on Aug. 17, 2021.
Application 16/913,265 is a continuation of application No. 16/474,978, granted, now 11,416,281, issued on Aug. 16, 2022, previously published as PCT/US2016/069640, filed on Dec. 31, 2016.
Prior Publication US 2023/0418655 A1, Dec. 28, 2023
Int. Cl. G06F 9/48 (2006.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01)

CPC G06F 9/48 (2013.01) [G06F 9/3001 (2013.01); G06F 9/30036 (2013.01); G06F 9/3004 (2013.01); G06F 9/383 (2013.01)]

20 Claims

10. An integrated circuit comprising:

a first plurality of cores comprising a first microarchitecture;

a second plurality of cores comprising a second microarchitecture different from the first microarchitecture;

an interconnect coupled to the first and second plurality of cores; and

an accelerator coupled to the interconnect, the accelerator to perform matrix processing operations, the accelerator comprising:

an array of multiply-accumulate units operable in response to multiply-accumulate instructions to perform multiply-accumulate operations with a first plurality of data elements of a first matrix and a second plurality of data elements of a second matrix,

a plurality of memories associated with the array of multiply-accumulate units, the plurality of memories to store the first plurality of data elements and the second plurality of data elements,

each multiply-accumulate unit in the array of multiply-accumulate units comprising:

multiplication circuitry to multiply each data element of a subset of the first plurality of data elements with a corresponding data element of a subset of the second plurality of data elements to generate a corresponding plurality of products; and

adder circuitry to add the plurality of products to generate a corresponding result data element of a plurality of result data elements,

wherein at least one core of the first plurality of cores or the second plurality of cores is configured to execute program code to schedule the matrix processing operations.