| CPC G06N 3/08 (2013.01) [G06F 9/50 (2013.01); G06F 9/5061 (2013.01); G06F 9/5077 (2013.01); G06N 3/04 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01); G06N 3/088 (2013.01); G06N 3/048 (2023.01); G06N 7/01 (2023.01)] | 20 Claims |

|
1. An apparatus comprising:
an interconnect switch configured to couple with a plurality of graphics processors via a plurality of point-to-point interconnects; and
one or more processors including a graphics processor, the graphics processor coupled with the interconnect switch via a point-to-point interconnect of the plurality of point-to-point interconnects, the graphics processor comprising:
a cluster of graphics multiprocessors configured for single instruction multiple thread (SIMT) operation, the cluster of graphics multiprocessors interconnected via a data interconnect and configured to exchange data via the data interconnect, the cluster of graphics multiprocessors including a graphics multiprocessor configured to:
receive data associated with a first thread group to be executed via the graphics multiprocessor during execution of operations associated with a second thread group, the data to be received via a point-to-point interconnect coupled with the interconnect switch in association with a communication pattern for messages to be transmitted between worker nodes of a first group of worker nodes configured to perform distributed training of a neural network; and
transmit data processed by the second thread group via the point-to-point interconnect coupled with the interconnect switch during execution of operations the first thread group.
|