| CPC G06F 17/16 (2013.01) [G06F 7/57 (2013.01); G06F 9/30036 (2013.01); G06F 9/3851 (2013.01); G06F 9/3887 (2013.01)] | 18 Claims |

|
1. An apparatus for data processing, comprising:
at least one memory; and
at least one processor coupled to the at least one memory and configured to:
fetch an element of an input matrix from graphics memory;
determine whether the element of the input matrix is to be used across multiple threads; and
store the element of the input matrix at a buffer until a workgroup corresponding to at least one of the multiple threads is executed in response to a first determination that the element of the input matrix is to be used across multiple threads, or store the element of the input matrix at a general purpose register (GPR) in response to a second determination that the element of the input matrix is not to be used across multiple threads.
|