| CPC G06F 12/0804 (2013.01) [G06F 12/0868 (2013.01); G06F 12/0875 (2013.01)] | 20 Claims |

|
1. A parallel processing unit (PPU) comprising:
a plurality of subpartition units configured to execute threads;
a cache coupled to the plurality of subpartition units; and
a request coalescer coupled to the plurality of subpartition units and the cache, wherein the request coalescer is to:
receive a first instruction to load a first data into a first register file associated with a first subpartition unit of the plurality of subpartition units;
receive a second instruction to load the first data into a second register file associated with a second subpartition unit of the plurality of subpartition units;
coalesce the first instruction and the second instruction into a first entry of the request coalescer based on instruction identifiers, wherein the first entry is associated with the first data;
determine that the first data is available in the cache; and
responsive to the determination that the first data is available in the cache, multicast the first data from the cache to the first register file and the second register file.
|