| CPC G06F 8/453 (2013.01) [G06F 8/443 (2013.01); G06F 8/451 (2013.01)] | 58 Claims |

|
1. A computer implemented method, comprising:
receiving a machine learning (ML) network model comprising a plurality of ML operations in high-level code;
partitioning the ML network model into a plurality of sub-graphs;
generating an internal representation (IR) for each sub-graph of the plurality of sub-graphs, wherein the IR is mapped to one or more components in a multi-processing tile device;
identifying two more processing tiles of the multi-processing tile device having a same dimension for an input tensor data as one another performing a same primitive function based on the IR;
determining whether the two more processing tiles of the multi-processing tile device have a same dimension for their respective output tensor data for the same primitive function based on the IR;
responsive to determining that the two or more processing tiles have the same dimension for their respective output tensor data for the same primitive function, allocating a same memory address range within a respective on-chip memory (OCM) of the two or more processing tiles for the same primitive function;
linking the memory address range within the respective OCM of the two or more processing tiles to one another to form a grouped memory space within the respective OCM of the two or more processing tiles; and
compiling the each sub-graph of the plurality of sub-graphs based on the linking.
|