US 12,073,262 B2
Barrier synchronization between host and accelerator over network
Ola Torudbakken, Oslo (NO); and Wei-Lin Guay, Hvalstad (NO)
Assigned to GRAPHCORE LIMITED, Bristol (GB)
Filed by Graphcore Limited, Bristol (GB)
Filed on Jun. 4, 2021, as Appl. No. 17/338,898.
Claims priority of application No. 2010810 (GB), filed on Jul. 14, 2020.
Prior Publication US 2022/0019487 A1, Jan. 20, 2022
Int. Cl. G06F 9/52 (2006.01); G06F 9/38 (2018.01); G06F 9/54 (2006.01); G06F 15/173 (2006.01)
CPC G06F 9/522 (2013.01) [G06F 9/3851 (2013.01); G06F 9/543 (2013.01); G06F 9/544 (2013.01); G06F 15/173 (2013.01); G06F 15/17325 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A host system configured to interface over a network with a plurality of subsystems comprising tile computers acting as work accelerators to the host system in accordance with local programs running on each of the subsystems, wherein the host system comprises at least one processor and at least one memory storing computer readable instructions,
wherein the at least one processor of the host system is configured to execute the computer readable instructions to compile a set of local programs, each local program for running on a different one of the subsystems, wherein compiling the set of local programs comprises defining a set of synchronisation groups, each synchronisation group comprising at least two of the subsystems,
wherein each local program comprises:
a set of compute instructions to perform, when executed by at least one processor of the respective subsystem, compute operations during one or more of a plurality of compute phases of the respective subsystem; and
a set of data transfer instructions to, when executed by at least one processor of the respective subsystem, perform a plurality of barrier synchronisations at predefined points in the execution of the respective local program, each of the plurality of barrier synchronisations between one of the plurality of compute phases and a respective exchange phase,
wherein each of at least two of the barrier synchronisations are defined between the at least two of the subsystems forming one of the synchronisation groups, wherein at least one of the barrier synchronisations is a synchronisation requiring host involvement for which, in response to an initiator of data transfer provided to the host system by the respective subsystem, data exchange is performed between the respective subsystem and the host system, and
wherein the at least one processor of the host system is configured to, in response to each of the initiators of data transfer, exchange data in the respective exchange phase over the network with one of the subsystem from which the initiator was received, the exchange of data comprising synchronising at least part of a buffer of the host system with at least part of a buffer of a network interface device connected to the one of the subsystem from which the initiator was received,
wherein:
the buffer of the network interface device is a first multi-stream buffer for storing data of a plurality of different streams, each stream having an identifier;
the buffer of the host system is a second multi-stream buffer for storing data of the plurality of different streams;
each of the initiators of data transfer comprises an identifier of a stream of the plurality of different streams between the host system and the one of the subsystems from which the identifier was received;
exchanging data over the network with one of the subsystems from which the identifier was received comprises sending and/or receiving data belonging to the respective identified stream; and
the at least part of the buffer of the host system comprises a part of the second multi-stream buffer associated with the identified stream, wheren the at least part of the buffer of the network interface device accessible to the one of the subsystem from which the identifier was received comprises a part of the first multi-stream buffer associated with the identified stream.