| CPC G06F 15/825 (2013.01) [G06F 8/433 (2013.01); G06F 8/4441 (2013.01); G06F 15/17375 (2013.01); G06F 17/16 (2013.01)] | 19 Claims |

|
1. A computing system, the system comprising:
a host computer comprising a graph optimization module configured to conduct a method comprising:
receiving a compute graph for execution on multiple reconfigurable dataflow processors RDPs, the multiple RDPs being interconnected with a ring network, the ring network having R interconnected RDPs, including a first RDP, and a second RDP adjacent to the first RDP in the ring network, wherein R is an integer value;
detecting a node of the compute graph that specifies a reduction operation for a first tensor and a second tensor;
partitioning the node of the compute graph into a compute subgraph corresponding to the first RDP;
inserting a first inserted node into the compute subgraph that specifies a partial reduction operation for producing a partial reduction result corresponding to a shard of the first tensor and a shard of the second tensor;
inserting a second inserted node into the compute subgraph for communicating the partial reduction result to the second RDP;
inserting a third inserted node into the compute subgraph that specifies a reduction operation for producing a total reduction result for the first tensor and the second tensor; and
inserting a fourth inserted node into the compute subgraph for communicating the total reduction result to the first RDP.
|