US 12,450,486 B2
Depth-first deep convolutional neural network inference
Piero Zappi, La Jolla, CA (US); Jin Won Lee, San Diego, CA (US); Christopher Lott, San Diego, CA (US); and Rexford Alan Hill, San Diego, CA (US)
Assigned to QUALCOMM Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Dec. 14, 2020, as Appl. No. 17/121,499.
Claims priority of provisional application 62/948,113, filed on Dec. 13, 2019.
Prior Publication US 2021/0182684 A1, Jun. 17, 2021
Int. Cl. G06N 3/082 (2023.01); G06F 9/48 (2006.01); G06F 9/50 (2006.01); G06N 3/04 (2023.01); G06N 3/045 (2023.01); G06N 3/10 (2006.01)
CPC G06N 3/082 (2013.01) [G06F 9/4881 (2013.01); G06N 3/04 (2013.01); G06F 9/5066 (2013.01); G06F 2209/485 (2013.01)] 26 Claims
OG exemplary drawing
 
1. A method performed by a computing device comprising a processor, on-chip memory, and off-chip memory, the method comprising:
determining a first partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device, the first partition comprising a set of consecutive layers of the ANN, the first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of data corresponding to a write back of intermediate activations to the off-chip memory, the amount of on-chip memory used by the first partition corresponding to a sum of a first amount of on-chip memory used for respective partial output of each layer of the first partition and a second amount of on-chip memory used for respective weights of each layer of the first partition, each partial output comprising a tile of one or more output activations generated in response to a corresponding portion of input activations received at a respective layer of the first partition, the tile being a spatial or channel-wise subset of total output activations associated with the respective layer; and
processing, at the computing device via the multi-layer ANN, an input, using the depth-first processing in accordance with the first partition, the depth-first processing comprising processing each tile associated with a respective portion of input activations through the set of consecutive layers of the first partition before processing a subsequent portion of input activations.