US 12,079,156 B2
Lossless tiling in convolution networks—materialization of tensors
Tejas Nagendra Babu Nama, Sunnyvale, CA (US); Ruddhi Chaphekar, Santa Clara, CA (US); Ram Sivaramakrishnan, San Jose, CA (US); Raghu Prabhakar, San Jose, CA (US); Sumti Jairath, Santa Clara, CA (US); Junjue Wang, San Mateo, CA (US); Kaizhao Liang, Palo Alto, CA (US); Adi Fuchs, West Windsor, NJ (US); Matheen Musaddiq, Austin, TX (US); and Arvind Krishna Sujeeth, San Francisco, CA (US)
Assigned to SambaNova Systems, Inc., Palo Alto, CA (US)
Filed by SambaNova Systems, Inc., Palo Alto, CA (US)
Filed on Jul. 23, 2021, as Appl. No. 17/384,515.
Application 17/384,515 is a continuation of application No. 17/216,657, filed on Mar. 29, 2021, granted, now 11,263,170.
Prior Publication US 2022/0309028 A1, Sep. 29, 2022
Int. Cl. G06F 15/78 (2006.01); G06F 16/901 (2019.01); G06F 17/16 (2006.01)
CPC G06F 15/7885 (2013.01) [G06F 15/7839 (2013.01); G06F 16/9024 (2019.01); G06F 17/16 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A data processing system,
comprising: a plurality of
reconfigurable processors;
processor memory operatively coupled to the plurality of reconfigurable processors;
and runtime logic, operatively coupled to the plurality of reconfigurable processors and the
processor memory, configured to
configure at least one reconfigurable processor in the plurality of reconfigurable processors with a first subgraph in a sequence of subgraphs of a graph;
load an input onto the processor memory;
on a tile-by-tile basis, process a first set of input tiles from the input through the first subgraph and generate a first set of intermediate tiles, load the first set of intermediate tiles onto the processor memory, and process the first set of intermediate tiles through the first subgraph and generate a first set of output tiles;
compose output tiles in the first set of output tiles into a first composed input, and load the first composed input onto the processor memory;
configure at least one reconfigurable processor in the plurality of reconfigurable processors with a second subgraph in the sequence of subgraphs;
on the tile-by-tile basis, process a second set of input tiles from the first composed input through the second subgraph and generate a second set of intermediate tiles, load the second set of intermediate tiles onto the processor memory, and process the second set of intermediate tiles through the second subgraph and generate a second set of output tiles; and
compose output tiles in the second set of output tiles into a second composed input, and load the second composed input onto the processor memory.