US 12,306,771 B2
	Efficient data sharing for graphics data processing operations
Joydeep Ray, Folsom, CA (US); Altug Koker, El Dorado Hills, CA (US); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Michael Macpherson, Portland, OR (US); Aravindh V. Anantaraman, Folsom, CA (US); Vasanth Ranganathan, El Dorado Hills, CA (US); Lakshminarayanan Striramassarma, Folsom, CA (US); Varghese George, Folsom, CA (US); Abhishek Appu, El Dorado Hills, CA (US); and Prasoonkumar Surti, Folsom, CA (US)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on May 22, 2024, as Appl. No. 18/671,095.
Application 18/671,095 is a continuation of application No. 18/358,550, filed on Jul. 25, 2023, granted, now 12,032,496.
Application 18/358,550 is a continuation of application No. 17/212,503, filed on Mar. 25, 2021, granted, now 11,755,501, issued on Sep. 12, 2023.
Claims priority of provisional application 63/000,784, filed on Mar. 27, 2020.
Prior Publication US 2024/0385975 A1, Nov. 21, 2024
Int. Cl. G06F 13/16 (2006.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06F 9/50 (2006.01); G06T 1/20 (2006.01); G06T 1/60 (2006.01)

CPC G06F 13/1605 (2013.01) [G06F 9/3004 (2013.01); G06F 9/3887 (2013.01); G06F 9/3888 (2023.08); G06F 9/38885 (2023.08); G06F 9/5016 (2013.01); G06T 1/20 (2013.01); G06T 1/60 (2013.01)]

20 Claims

1. An apparatus comprising:

a processing resource of a chiplet; and

an L1 cache communicably coupled to the processing resource and comprising synchronization hardware circuit to:

allocate a first thread group as a member of a super thread group (STG) comprising a collection of thread groups running on the chiplet;

receive from a second thread group within the STG, a request to communicate with the first thread group identified by a thread group identifier (ID) within the STG;

access a routing table to determine a location of the first thread group based on the thread group ID; and

route the request to the determined location of the first thread group using communication links between L1 caches of the chiplet.

10. A method comprising:

allocating, by a synchronization hardware circuit of an L1 cache of a graphics processor, a first thread group as a member of a super thread group (STG) comprising a collection of thread groups running on the graphics processor;

receiving, by the synchronization hardware circuit, from a second thread group within the STG, a request to communicate with the first thread group identified by a thread group identifier (ID) within the STG;

accessing, by the synchronization hardware circuit, a routing table to determine a location of the first thread group based on the thread group ID; and

routing, by the synchronization hardware circuit, the request to the determined location of the first thread group using communication links between L1 caches of the graphics processor.

16. A non-transitory computer-readable medium having instructions stored thereon, which when executed by one or more processors, cause the one or more processors to:

allocate, by a synchronization hardware circuit of an L1 cache of the one or more processors, a first thread group as a member of a super thread group (STG) comprising a collection of thread groups running on the one or more processors;

receive, by the synchronization hardware circuit, from a second thread group within the STG, a request to communicate with the first thread group identified by a thread group identifier (ID) within the STG;

access, by the synchronization hardware circuit, a routing table to determine a location of the first thread group based on the thread group ID; and

route, by the synchronization hardware circuit, the request to the determined location of the first thread group using communication links between L1 caches of the one or more processors.