US 12,437,355 B2
	Granular GPU DVFS with execution unit partial powerdown
Kenneth Daxer, Sunnyvale, CA (US); Stephen H. Gunther, Beaverton, OR (US); Michael N. Derr, El Dorado Hills, CA (US); and Eric Samson, Folsom, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jun. 23, 2023, as Appl. No. 18/340,139.
Application 18/340,139 is a continuation of application No. 18/185,008, filed on Mar. 16, 2023.
Claims priority of provisional application 63/321,726, filed on Mar. 20, 2022.
Claims priority of provisional application 63/321,725, filed on Mar. 20, 2022.
Prior Publication US 2023/0334613 A1, Oct. 19, 2023
Int. Cl. G06T 1/20 (2006.01); G06T 1/60 (2006.01)

CPC G06T 1/20 (2013.01) [G06T 1/60 (2013.01)]

24 Claims

1. A graphics processor comprising:

a plurality of dies integrated in a package, at least one die of the plurality of dies functionally heterogeneous relative to at least one other die of the plurality of dies and manufactured with a different process technology than the at least one other die,

the plurality of dies including:

a graphics compute die comprising a plurality of functional blocks, a first portion of the functional blocks associated with a first clock domain to operate at a first frequency and a second portion of the functional blocks associated with a second clock domain to operate at a second frequency different from the first frequency, the functional blocks including:

a plurality of a graphics core blocks, at least one graphics core block comprising a plurality of graphics cores operable at the first frequency, each graphics core comprising:

a vector engine to execute vector instructions and perform parallel operations on data elements of vector operands, the data elements stored in vector registers;

a matrix accelerator to perform matrix multiplication operations with source matrix data elements, including 16-bit floating point and 8-bit integer matrix data elements;

a ray tracing accelerator to perform operations related to ray traversal and intersection; and

a shared local memory to store data to be processed by the vector engines, the matrix accelerator, or the ray tracing accelerator;

a command processor operable at the second frequency, the command processor to read a command associated with a workload from an in-memory buffer and to responsively identify one or more graphics core blocks of the plurality of graphics core blocks to execute the workload;

a plurality of cache dies, each cache die comprising a cache memory to store a portion of the data accessed by one or more of the plurality of graphics core blocks; and

a packet-switched interconnect fabric to directly connect the graphics compute die to each of the plurality of cache dies.