| CPC G06N 3/06 (2013.01) | 15 Claims |

|
1. A device for determining a parallel computation scheme for a neural network, the device comprising at least one processor configured to:
receive a computation graph for the neural network;
transform the computation graph into a recursive dataflow graph comprising a plurality of recursive subgraphs, wherein each of the recursive subgraphs is respectively a tuple of another of the recursive subgraphs and an operator node;
determine a number of partitioning recursions based on a number of parallel computing devices;
for each of the partitioning recursions:
determine a plurality of costs corresponding to a plurality of operator nodes associated with the recursive dataflow graph,
determine a processing order of the plurality of recursive subgraphs based on a descending order of the plurality of costs,
process the plurality of recursive subgraphs in the determined processing order, wherein processing a recursive subgraph, of the plurality of recursive subgraphs, comprises selecting a partitioning axis for tensors associated with an operator node of the recursive subgraph;
output a partitioning scheme comprising partitioning axes for each of the tensors associated with the plurality of operator nodes; and
wherein to select the partitioning axis for the tensors associated with the operator node based on an inter-operator communication cost comprising an amount of data to be communicated between the parallel computing devices for executing a neighboring operator node based on a shared tensor between the operator node and the neighboring operator node or for executing the operator node based on an output of the neighboring operator node.
|