US 12,430,109 B2
	Critical stage optimization for reconfigurable architectures
Adam Bordelon, Palo Alto, CA (US); and David Alan Koeplinger, Egg Harbor, NJ (US)
Assigned to SambaNova Systems, Inc., Palo Alto, CA (US)
Filed by SambaNova Systems, Inc., Palo Alto, CA (US)
Filed on Feb. 28, 2023, as Appl. No. 18/115,118.
Claims priority of provisional application 63/314,993, filed on Feb. 28, 2022.
Prior Publication US 2023/0273879 A1, Aug. 31, 2023
Int. Cl. G06F 8/41 (2018.01); G06F 15/78 (2006.01); G06F 15/82 (2006.01)

CPC G06F 8/4441 (2013.01) [G06F 15/7867 (2013.01); G06F 15/825 (2013.01)]

20 Claims

1. A method for reducing latency and increasing throughput in a reconfigurable computing system, the method comprising:

receiving a user program for execution on a reconfigurable dataflow computing system, the reconfigurable dataflow computing system comprising a grid of compute units and a grid of memory units interconnected with a switching array, the user program comprising a plurality of tensor-based algebraic expressions;

converting the plurality of tensor-based algebraic expressions to an intermediate representation comprising a plurality of stages, including a first stage and a second stage which is adjacent to the first stage, each stage comprising one or more logical operations executable via dataflow through one or more compute units of the grid of compute units, each stage preceded by and followed by a buffer, each buffer corresponding to one or more memory units within the grid of memory units;

detecting a memory mapping operation within a critical the first stage; and

moving the memory mapping operation to the second stage;

wherein the memory mapping operation is executable by the one or more memory units within the second stage and wherein dataflow through the buffer is controlled by one or more memory units within the grid of memory units.