US 12,073,489 B2
Handling pipeline submissions across many compute units
Balaji Vembu, Folsom, CA (US); Altug Koker, El Dorado Hills, CA (US); and Joydeep Ray, Folsom, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Apr. 13, 2023, as Appl. No. 18/300,052.
Application 18/300,052 is a continuation of application No. 17/591,152, filed on Feb. 2, 2022, granted, now 11,803,934.
Application 17/591,152 is a continuation of application No. 17/197,126, filed on Mar. 10, 2021, granted, now 11,244,420, issued on Feb. 8, 2022.
Application 17/197,126 is a continuation of application No. 16/834,902, filed on Mar. 30, 2020, granted, now 10,977,762, issued on Apr. 13, 2021.
Application 16/834,902 is a continuation of application No. 16/446,946, filed on Jun. 20, 2019, granted, now 10,896,479, issued on Jan. 19, 2021.
Application 16/446,946 is a continuation of application No. 16/150,012, filed on Oct. 2, 2018, granted, now 10,497,087, issued on Dec. 3, 2019.
Application 16/150,012 is a continuation of application No. 15/493,233, filed on Apr. 21, 2017, granted, now 10,325,341, issued on Jun. 18, 2019.
Prior Publication US 2023/0252597 A1, Aug. 10, 2023
Int. Cl. G06T 1/20 (2006.01); G06T 15/00 (2011.01)
CPC G06T 1/20 (2013.01) [G06T 15/005 (2013.01); G06T 2200/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus comprising:
a processing cluster including an array of multiprocessors coupled to an interconnect fabric;
scheduling circuitry to distribute a plurality of thread groups across the array of multiprocessors, each thread group comprising a plurality of threads and each thread comprising a plurality of instructions to be executed by at least one of the multiprocessors; and
a first multiprocessor of the array of multiprocessors to be assigned to process a first thread group comprising a first plurality of threads, the first multiprocessor comprising a plurality of parallel execution circuits,
wherein to process the first thread group, the plurality of parallel execution circuits is to execute instructions of a first thread sub-group and instructions of a second thread sub-group, the first and second thread sub-groups formed based on the first thread group, the first and second thread sub-groups each including a plurality of threads,
wherein the plurality of parallel execution circuits is to execute the instructions of the first thread sub-group to generate a first portion of an output data set and to execute the instructions of the second thread sub-group to generate a second portion of the output data set, the second thread sub-group having a data dependency on the first thread sub-group, and
wherein the first multiprocessor includes circuitry to cause threads of the second thread sub-group to sleep until the threads of the first thread sub-group have satisfied the data dependency.