CPC G06N 3/048 (2023.01) [G06F 9/4881 (2013.01); G06F 9/5016 (2013.01); G06F 18/2163 (2023.01); G06N 3/063 (2013.01); G06F 2209/5017 (2013.01); G06F 2209/506 (2013.01)] | 18 Claims |
1. A method for generating neural network program instructions for a neural network inference circuit to execute a neural network, the neural network inference circuit comprising a particular amount of available memory, the method comprising:
receiving a specification of the neural network comprising a plurality of layers;
determining (i) a required amount of weight memory for the neural network based on (1) a number of filters in the neural network and (2) a percentage of the weights of the neural network that are non-zero and (ii) required amounts of activation memory for each of a set of layers of the neural network, wherein the weights of the neural network are ternary weight values such that each weight is encoded in the memory of the neural network inference circuit as one of zero, a positive value for the weight, and a negation of the positive value for the weight; and
when the required amount of weight memory and the required amount of activation memory for at least one layer is greater than the particular amount of available memory, generating the neural network program instructions for the neural network inference circuit to execute a first set of the layers of the neural network multiple times for different blocks of input data and execute a second set of the layers in a single pass.
|