US 12,450,060 B1
Sharing loop cache instances among multiple threads in processor devices
Puneet Talwar, Austin, TX (US); Stephen Shannon, Austin, TX (US); Suresh Kumar Venkumahanti, Austin, TX (US); and Karan Suri, Austin, TX (US)
Assigned to QUALCOMM Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Aug. 28, 2024, as Appl. No. 18/817,548.
Int. Cl. G06F 9/38 (2018.01); G06F 9/30 (2018.01)
CPC G06F 9/30065 (2013.01) [G06F 9/30047 (2013.01); G06F 9/381 (2013.01); G06F 9/3851 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A processor device, comprising:
a plurality of loop cache instances;
a plurality of use bit registers, each corresponding to a loop cache instance of the plurality of loop cache instances, and each comprising a plurality of use bits that each correspond to a thread of a plurality of threads; and
a loop cache controller circuit configured to:
detect a first iteration of a loop body, comprising a plurality of loop instructions, in an instruction stream executed by a first thread of the plurality of threads;
determine that the plurality of loop instructions were previously stored in a loop cache instance associated with a second thread of the plurality of threads of the plurality of loop cache instances; and
responsive to determining that the plurality of loop instructions were previously stored in the loop cache instance associated with the second thread:
set a use bit corresponding to the first thread of a use bit register corresponding to the loop cache instance of the plurality of use bit registers; and
on a subsequent iteration of the loop body by the first thread, retrieve the plurality of loop instructions from the loop cache instance instead of an instruction cache, based on the use bit corresponding to the first thread.