US 11,755,501 B2
Efficient data sharing for graphics data processing operations
Joydeep Ray, Folsom, CA (US); Altug Koker, El Dorado Hills, CA (US); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Michael Macpherson, Portland, OR (US); Aravindh V. Anantaraman, Folsom, CA (US); Vasanth Ranganathan, El Dorado Hills, CA (US); Lakshminarayanan Striramassarma, Folsom, CA (US); Varghese George, Folsom, CA (US); Abhishek Appu, El Dorado Hills, CA (US); and Prasoonkumar Surti, Folsom, CA (US)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Mar. 25, 2021, as Appl. No. 17/212,503.
Claims priority of provisional application 63/000,784, filed on Mar. 27, 2020.
Prior Publication US 2021/0303481 A1, Sep. 30, 2021
Int. Cl. G06F 13/16 (2006.01); G06F 9/50 (2006.01); G06T 1/60 (2006.01); G06F 9/30 (2018.01); G06T 1/20 (2006.01); G06F 9/38 (2018.01)
CPC G06F 13/1605 (2013.01) [G06F 9/3004 (2013.01); G06F 9/3887 (2013.01); G06F 9/5016 (2013.01); G06T 1/20 (2013.01); G06T 1/60 (2013.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus comprising:
a processing resource to generate a stream of instructions;
an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to:
determine that a set of memory requests in the stream of instructions access a same memory page; and
set a marker in a first request of the set of memory requests; and
arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the same memory page and to, in response to receiving the first request with the marker that is set, remain with the processing resource to process the set of memory requests.
 
10. A method comprising:
generating, by a processing resource of a graphics processor, a stream of instructions;
determining, by an on-page detector circuit of an Li cache of the graphics processor, that a set of memory requests in the stream of instructions access a same memory page;
setting, by the on-page detector circuit, a marker in a first request of the set of memory requests; and
in response to receiving the first request with the marker that is set, remaining, by arbitration circuitry of the graphics processor, with the processing resource to process the set of memory requests, wherein the arbitration circuitry is to route the set of memory requests to memory comprising the same memory page.
 
16. A non-transitory computer-readable medium having instructions stored thereon, which when executed by one or more processors, cause the one or more processors to:
generate, by a processing resource of the one or more processors, a stream of instructions;
determine, by an on-page detector circuit of an L1 cache of the one or more processors, that a set of memory requests in the stream of instructions access a same memory page;
set, by the on-page detector circuit, a marker in a first request of the set of memory requests; and
in response to receiving the first request with the marker that is set, remain, by arbitration circuitry of the one or more processors, with the processing resource to process the set of memory requests, wherein the arbitration circuitry is to route the set of memory requests to memory comprising the same memory page.