CPC G06F 15/7885 (2013.01) [G06F 15/7839 (2013.01); G06F 16/9024 (2019.01); G06F 17/16 (2013.01)] | 15 Claims |
1. A data processing system configured to receive a graph with a sequence of layers, comprising:
a host executing a runtime logic configured to
execute a first forward subgraph in a sequence of forward subgraphs of the graph in a first forward topology of tiling configurations to forward propagate a first set of input tiles through a first input layer and generate a first set of intermediate tiles, forward propagate the first set of intermediate tiles through a first intermediate layer and generate a first set of further intermediate tiles, and forward propagate the first set of further intermediate tiles through a first output layer and generate a first set of non-overlapping output tiles; and
execute a first backward subgraph in a sequence of backward subgraphs of the graph in a first backward topology of tiling configurations to backward propagate a first set of non-overlapping input gradient tiles through a first backpropagation input layer and generate (i) a first set of intermediate gradient tiles and (ii) first input weight gradients for the first output layer, backward propagate the first set of intermediate gradient tiles through a first backpropagation intermediate layer and generate (i) a first set of further intermediate gradient tiles and (ii) first intermediate weight gradients for the first intermediate layer, and backward propagate the first set of further intermediate gradient tiles through a first backpropagation output layer and generate (i) a first set of output gradient tiles and (ii) first output weight gradients for the first input layer, wherein
gradient tiles in the first set of further intermediate gradient tiles share overlapping regions with adjacent gradient tiles in the first set of further intermediate gradient tiles; the runtime logic is further configured to store the gradient tiles in the first set of further intermediate gradient tiles such that the overlapping regions are redundantly localized in each of the gradient tiles in the first set of further intermediate gradient tiles to form a first set of standalone further intermediate gradient tiles with no overlaps; and
the runtime logic is further configured to read the first set of standalone further intermediate gradient tiles on a tile-by-tile basis to generate the first set of output gradient tiles and/or the first output weight gradients.
|