CPC G06T 1/20 (2013.01) [G06T 15/005 (2013.01); G06T 2200/04 (2013.01)] | 20 Claims |
1. An apparatus comprising:
a processing cluster including an array of multiprocessors coupled to an interconnect fabric;
scheduling circuitry to distribute a plurality of thread groups across the array of multiprocessors, each thread group comprising a plurality of threads and each thread comprising a plurality of instructions to be executed by at least one of the multiprocessors; and
a first multiprocessor of the array of multiprocessors to be assigned to process a first thread group comprising a first plurality of threads, the first multiprocessor comprising a plurality of parallel execution circuits,
wherein to process the first thread group, the plurality of parallel execution circuits is to execute instructions of a first thread sub-group and instructions of a second thread sub-group, the first and second thread sub-groups formed based on the first thread group, the first and second thread sub-groups each including a plurality of threads,
wherein the plurality of parallel execution circuits is to execute the instructions of the first thread sub-group to generate a first portion of an output data set and to execute the instructions of the second thread sub-group to generate a second portion of the output data set, the second thread sub-group having a data dependency on the first thread sub-group, and
wherein the first multiprocessor includes circuitry to cause threads of the second thread sub-group to sleep until the threads of the first thread sub-group have satisfied the data dependency.
|