US 11,709,664 B2
Anti-congestion flow control for reconfigurable processors
Weiwei Chen, Mountain View, CA (US); Raghu Prabhakar, San Jose, CA (US); David Alan Koeplinger, Menlo Park, CA (US); Sitanshu Gupta, San Jose, CA (US); Ruddhi Arun Chaphekar, Santa Clara, CA (US); Ajit Punj, San Jose, CA (US); and Sumti Jairath, Santa Clara, CA (US)
Assigned to SambaNova Systems, Inc., Palo Alto, CA (US)
Filed by SambaNova Systems, Inc., Palo Alto, CA (US)
Filed on Jun. 2, 2020, as Appl. No. 16/890,841.
Prior Publication US 2021/0373867 A1, Dec. 2, 2021
Int. Cl. G06F 8/41 (2018.01); G06F 15/78 (2006.01); G06F 15/82 (2006.01)
CPC G06F 8/452 (2013.01) [G06F 8/41 (2013.01); G06F 15/7867 (2013.01); G06F 15/825 (2013.01)] 31 Claims
OG exemplary drawing
 
1. A system, comprising:
memory storing a dataflow graph with a plurality of compute nodes that asynchronously transmit data along data connections, wherein the dataflow graph includes a loop nest in which loops are arranged in a hierarchy of levels, such that a loop at a second level is within a loop at a first level; and
a compiler having access to the memory and configured to:
partition execution of the dataflow graph into two or more asynchronous stages by inserting stage buffers between compute nodes of the plurality of compute nodes that are inside the loop at the second level and at input/output boundaries between the loop at the first level and the loop at the second level;
classify the stage buffers as producers and consumers on a stage-by-stage basis by classifying those stage buffers that provide input data to a particular stage as the producers, and classifying those stage buffers that store output data from the particular stage as the consumers, wherein the particular stage includes at least one compute node of plurality of compute nodes;
create control connections between the stage buffers by extending the control connections from the consumers of the particular stage to the producers for the particular stage;
configure each of the producers with a write credit counter that is initialized with one or more write credits and a ready-to-read credit counter initialized such that the ready-to-read credit counter of a particular producer has as many read credits as a buffer depth of a corresponding consumer that reads data from the particular producer; and
control data transmission between the compute nodes along the data connections by using the control connections to control writing of the data by the producers into the consumers, such that writing is only performed when both the ready-to-read credit counter and the write credit counter are non-zero;
wherein the ready-to-read credit counter is configured to decrement when the particular producer begins writing a buffer data unit into the corresponding consumer along a data connection,
the ready-to-read credit counter and the write credit counter are both configured to decrement when the particular producer begins writing the buffer data unit into the corresponding consumer along the data connection,
the ready-to-read credit counter is configured to increment when the particular producer receives from the corresponding consumer a read ready token along a control connection, the read ready token indicating that the corresponding consumer has freed a buffer data unit and is ready to receive an additional buffer data unit; and
the write credit counter is configured to increment when the particular producer receives from the corresponding consumer a write done token along the control connection, the write done token indicating that the writing of the buffer data unit into the corresponding consumer has completed.