US 11,934,343 B2
Lossless tiling in convolution networks-backward pass
Tejas Nagendra Babu Nama, Sunnyvale, CA (US); Ruddhi Chaphekar, Santa Clara, CA (US); Ram Sivaramakrishnan, San Jose, CA (US); Raghu Prabhakar, San Jose, CA (US); Sumti Jairath, Santa Clara, CA (US); Junjue Wang, San Mateo, CA (US); Kaizhao Liang, Palo Alto, CA (US); Adi Fuchs, West Windsor, NJ (US); Matheen Musaddiq, Austin, TX (US); and Arvind Krishna Sujeeth, San Francisco, CA (US)
Assigned to SambaNova Systems, Inc., Palo Alto, CA (US)
Filed by SambaNova Systems, Inc., Palo Alto, CA (US)
Filed on Jul. 23, 2021, as Appl. No. 17/384,507.
Application 17/384,507 is a continuation of application No. 17/216,657, filed on Mar. 29, 2021, granted, now 11,263,170.
Prior Publication US 2022/0309027 A1, Sep. 29, 2022
Int. Cl. G06F 15/78 (2006.01); G06F 16/901 (2019.01); G06F 17/16 (2006.01)
CPC G06F 15/7885 (2013.01) [G06F 15/7839 (2013.01); G06F 16/9024 (2019.01); G06F 17/16 (2013.01)] 15 Claims
OG exemplary drawing
 
1. A data processing system configured to receive a graph with a sequence of layers, comprising:
a host executing a runtime logic configured to
execute a first forward subgraph in a sequence of forward subgraphs of the graph in a first forward topology of tiling configurations to forward propagate a first set of input tiles through a first input layer and generate a first set of intermediate tiles, forward propagate the first set of intermediate tiles through a first intermediate layer and generate a first set of further intermediate tiles, and forward propagate the first set of further intermediate tiles through a first output layer and generate a first set of non-overlapping output tiles; and
execute a first backward subgraph in a sequence of backward subgraphs of the graph in a first backward topology of tiling configurations to backward propagate a first set of non-overlapping input gradient tiles through a first backpropagation input layer and generate (i) a first set of intermediate gradient tiles and (ii) first input weight gradients for the first output layer, backward propagate the first set of intermediate gradient tiles through a first backpropagation intermediate layer and generate (i) a first set of further intermediate gradient tiles and (ii) first intermediate weight gradients for the first intermediate layer, and backward propagate the first set of further intermediate gradient tiles through a first backpropagation output layer and generate (i) a first set of output gradient tiles and (ii) first output weight gradients for the first input layer, wherein
gradient tiles in the first set of further intermediate gradient tiles share overlapping regions with adjacent gradient tiles in the first set of further intermediate gradient tiles; the runtime logic is further configured to store the gradient tiles in the first set of further intermediate gradient tiles such that the overlapping regions are redundantly localized in each of the gradient tiles in the first set of further intermediate gradient tiles to form a first set of standalone further intermediate gradient tiles with no overlaps; and
the runtime logic is further configured to read the first set of standalone further intermediate gradient tiles on a tile-by-tile basis to generate the first set of output gradient tiles and/or the first output weight gradients.