| CPC G06N 3/08 (2013.01) [G06F 9/547 (2013.01); G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01)] | 20 Claims |

|
1. A graphics processing unit of a first distributed training node, the graphics processing unit comprising:
a compute block including one or more processing clusters, the one or more processing clusters to perform compute operations associated with a machine learning framework workflow, the compute operations to generate gradient data associated with trainable parameters of a neural network;
a memory device to store gradient data during distributed training of the neural network, the gradient data stored at a memory address in the memory device; and
a fabric interface coupled with the memory device and the compute block, the fabric interface configured to transmit gradient data stored in the memory device according to a pre-defined communication operation, wherein the memory device is a physical memory device shared with the compute block and the fabric interface is to automatically transmit the gradient data stored in memory to a second distributed training node, the second distributed training node determined based on the memory address of the gradient data in the memory device.
|