US 12,223,325 B2
Apparatus and method of optimizing divergent processing in thread groups preliminary class
Daren Croxford, Swaffham Prior (GB); and Isidoros Sideris, Cambridge (GB)
Assigned to Arm Limited, Cambridge (GB)
Filed by Arm Limited, Cambridge (GB)
Filed on Jul. 24, 2023, as Appl. No. 18/357,503.
Claims priority of application No. 22386053 (EP), filed on Jul. 29, 2022.
Prior Publication US 2024/0036874 A1, Feb. 1, 2024
Int. Cl. G06F 9/30 (2018.01); G06F 9/38 (2018.01)
CPC G06F 9/3851 (2013.01) [G06F 9/3867 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method of operating a data processor in which execution threads may execute program instructions to perform data processing operations, and in which execution threads may be grouped together into thread groups in which the plural execution threads of a thread group can each execute a set of instructions in lockstep;
the data processor comprising:
an instruction execution processing circuit operable to execute instructions to perform processing operations for execution threads executing a program, wherein the instruction execution processing circuit is configured as a plurality of execution lanes, each execution lane being operable to perform processing operations for an execution thread of a thread group; and
an execution thread issuing circuit operable to issue execution threads of thread groups to the plurality of execution lanes of the instruction execution processing circuit for execution;
the method comprising:
determining, using the execution thread issuing circuit, whether active threads to be executed of a first thread group to perform a first operation and active threads to be executed of a second thread group to perform a second operation use different execution lanes of the plurality of execution lanes of the instruction execution processing circuit for execution in a processing cycle; and
issuing active threads from both first and second thread groups for execution across the different execution lanes in the processing cycle when the active threads to be executed of the first thread group and active threads to be executed of the second thread group use different execution lanes of the plurality of execution lanes of the instruction execution processing circuit for execution.