| CPC G06N 3/082 (2013.01) [G06F 9/4881 (2013.01); G06N 3/04 (2013.01); G06F 9/5066 (2013.01); G06F 2209/485 (2013.01)] | 26 Claims |

|
1. A method performed by a computing device comprising a processor, on-chip memory, and off-chip memory, the method comprising:
determining a first partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device, the first partition comprising a set of consecutive layers of the ANN, the first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of data corresponding to a write back of intermediate activations to the off-chip memory, the amount of on-chip memory used by the first partition corresponding to a sum of a first amount of on-chip memory used for respective partial output of each layer of the first partition and a second amount of on-chip memory used for respective weights of each layer of the first partition, each partial output comprising a tile of one or more output activations generated in response to a corresponding portion of input activations received at a respective layer of the first partition, the tile being a spatial or channel-wise subset of total output activations associated with the respective layer; and
processing, at the computing device via the multi-layer ANN, an input, using the depth-first processing in accordance with the first partition, the depth-first processing comprising processing each tile associated with a respective portion of input activations through the set of consecutive layers of the first partition before processing a subsequent portion of input activations.
|