US 12,112,265 B2
Architecture for running convolutional networks on memory and mips constrained embedded devices
Raka Singh, Bangalore (IN); Neeraj Pai, Bangalore (IN); Swastik Mahapatra, Bhubaneshwar (IN); and Anil M Sripadarao, Bangalore (IN)
Assigned to Analog Devices International Unlimited Company, Limerick (IE)
Filed by Analog Devices International Unlimited Company, Limerick (IE)
Filed on Dec. 18, 2020, as Appl. No. 17/127,560.
Prior Publication US 2022/0198257 A1, Jun. 23, 2022
Int. Cl. G06N 3/08 (2023.01); G06F 7/483 (2006.01); G06F 12/0802 (2016.01); G06N 3/02 (2006.01)
CPC G06N 3/08 (2013.01) [G06F 7/483 (2013.01); G06F 12/0802 (2013.01); G06N 3/02 (2013.01); G06F 2212/60 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method for configuring a deep neural network (DNN) to run on a resource constrained embedded device, the method comprising:
accessing DNN information including definition of layers and weights of the DNN;
obtaining cache or memory information for one or more cache or memory levels of the resource constrained embedded device;
configuring the DNN to be loaded onto the one or more cache or memory levels of the resource constrained embedded device based on the cache or memory information and the DNN information;
adjusting one or more weights of the DNN by a division factor;
storing, in a configuration file in association with each of the one or more weights, a correction factor corresponding to the division factor used to adjust the one or more weights of the DNN;
selecting, based on the cache or memory information, between a plurality of DNN processing schemes for loading the DNN information and data onto the resource constrained embedded device, wherein a first of the plurality of DNN processing schemes causes a sub-portion of one of the layers interleaved with the weights and data to be loaded onto a single cache or memory level of the one or more cache or memory levels, and wherein a second of the plurality of DNN processing schemes causes a complete channel corresponding to the layers and the weights and data to be loaded onto the single cache or memory level, wherein selecting the first of the plurality of DNN processing schemes comprises:
computing a first size corresponding to a first subset of rows and columns of the data across all of the layers;
determining that the first size fits within a level 1 cache of the resource constrained embedded device;
causing a second subset of rows and columns of the data across all of the layers to be read into the level 1 cache while the first subset is being processed; and
processing the second subset of the rows and columns based on an output of processing the first subset of the rows and columns, wherein one extra row of a set of data is retrieved from a level 3 cache into the level 1 cache and makes use of two rows that are repeated for a next iteration;
selecting, based on the layers of the DNN, a convolution technique from a plurality of convolution techniques;
processing a given set of data based on the DNN by applying the adjusted one or more weights comprising one or more fractional weights to the given set of data to produce output weighted data;
retrieving the configuration file to obtain the correction factor corresponding to the division factor used to adjust the one or more weights of the DNN; and
multiplying the output weighted data by the correction factor to recover a result corresponding to one or more original weight values of the DNN.