US 12,354,001 B2
	Hardware implemented point to point communication primitives for machine learning
Srinivas Sridharan, Bangalore (IN); Karthikeyan Vaidyanathan, Bangalore (IN); and Dipankar Das, Pune (IN)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Oct. 25, 2022, as Appl. No. 17/972,832.
Application 17/972,832 is a division of application No. 15/869,510, filed on Jan. 12, 2018, granted, now 11,488,008.
Claims priority of provisional application 62/502,447, filed on May 5, 2017.
Prior Publication US 2023/0177328 A1, Jun. 8, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G06F 9/54 (2006.01); G06N 3/04 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 3/084 (2023.01)

CPC G06N 3/08 (2013.01) [G06F 9/547 (2013.01); G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01)]

20 Claims

1. A graphics processing unit of a first distributed training node, the graphics processing unit comprising:

a compute block including one or more processing clusters, the one or more processing clusters to perform compute operations associated with a machine learning framework workflow, the compute operations to generate gradient data associated with trainable parameters of a neural network;

a memory device to store gradient data during distributed training of the neural network, the gradient data stored at a memory address in the memory device; and

a fabric interface coupled with the memory device and the compute block, the fabric interface configured to transmit gradient data stored in the memory device according to a pre-defined communication operation, wherein the memory device is a physical memory device shared with the compute block and the fabric interface is to automatically transmit the gradient data stored in memory to a second distributed training node, the second distributed training node determined based on the memory address of the gradient data in the memory device.