US 12,450,167 B1
	Autonomous gradient reduction in a reconfigurable processor system
Amitabh Menon, Palo Alto, CA (US); and Greg Dykema, Palo Alto, CA (US)
Assigned to SambaNova Systems, Inc., Palo Alto, CA (US)
Filed by SambaNova Systems, Inc., Palo Alto, CA (US)
Filed on Jul. 17, 2024, as Appl. No. 18/776,223.
Int. Cl. G06F 12/1081 (2016.01); G06F 9/54 (2006.01)

CPC G06F 12/1081 (2013.01) [G06F 9/544 (2013.01); G06F 2212/65 (2013.01)]

20 Claims

1. A coarse-grained reconfigurable processor system for implementing data-parallel training of a neural network, comprising:

a first memory;

a set of coarse-grained reconfigurable units (CGRUs) in a first coarse-grained reconfigurable processor that is coupled to the first memory and configured to implement at least a portion of the neural network, to determine first and second gradients, respectively, of first and second model parameters based on a batch of training data, and to store the first and second gradients in the first memory;

a network interface including an external direct memory access (DMA) engine coupled between the first memory and a network; and

a work queue associated with the external DMA engine, wherein completion of determining the first gradient triggers a first work queue entry of the work queue that directs the external DMA engine to transfer the first gradient for a gradient reduction operation from the first memory over the network to a second memory that is coupled to a second coarse-grained reconfigurable processor.