US 12,367,043 B2
Multi-threaded barrel processor using shared weight registers in a common weights register file
Alan Graham Alexander, Wotton-Under-Edge (GB); Simon Christian Knowles, Corston (GB); and Mrudula Chidambar Gore, Bath (GB)
Assigned to Graphcore Limited, Bristol (GB)
Filed by Graphcore Limited, Bristol (GB)
Filed on Feb. 15, 2019, as Appl. No. 16/277,022.
Claims priority of application No. 1821301 (GB), filed on Dec. 31, 2018.
Prior Publication US 2020/0210175 A1, Jul. 2, 2020
Int. Cl. G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06N 3/02 (2006.01)
CPC G06F 9/3013 (2013.01) [G06F 9/3001 (2013.01); G06F 9/3851 (2013.01); G06N 3/02 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A processor comprising:
a plurality of register files; and
an execution unit configured to execute instructions of an instruction set;
wherein the execution unit is a barrel-threaded execution unit configured to run a plurality of concurrent threads each in a different respective one of a repeating sequence of interleaved time slots, and for each of the concurrent threads, the plurality of register files comprises a context register file comprising a respective set of context registers arranged to hold a program state of the respective thread, each set of context registers comprising a respective set of arithmetic operand registers for use by the respective thread, wherein each context register file is accessible only by its own respective thread;
wherein the plurality of register files further comprises a common weights register file for which all the concurrent threads have read access, comprising a set of shared weights registers configured to hold weights common to some or all of the concurrent threads, wherein a first one of the concurrent threads and a second one of the concurrent threads both access the shared weights registers and are executed in different time slots in different execution cycles;
wherein the concurrent threads comprise a plurality of worker threads and the execution unit is further arranged to run, at any of the interleaved time slots, a supervisor subprogram comprising at least one supervisor thread configured to manage the worker threads;
wherein the supervisor subprogram is configured to write the weights in the shared weights registers, such that the weights in the shared weights registers can be written only by the supervisor subprogram, and the shared weights registers can be read only by the worker threads;
wherein the instruction set includes an arithmetic instruction having operands specifying a source of an input to be multiplied by at least one weight and a destination, the source and destination specified from amongst the respective set of arithmetic operand registers of the thread in which the arithmetic instruction is executed; and
wherein the execution unit is configured, in response to an opcode of the arithmetic instruction, to perform a multiplication operation comprising multiplying the input from said source by the at least one weight from at least one of the shared weights registers of the common weights register file, and to place a result in said destination.