CPC G06N 3/084 (2013.01) [G06N 3/048 (2023.01); G06N 3/063 (2013.01); G06N 5/046 (2013.01)] | 19 Claims |
1. A method for training a plurality of parameters of a machine-trained (MT) network comprising a plurality of layers of processing nodes, the plurality of parameters subject to a sparsity constraint that requires a threshold portion of the parameters to be equal to zero, wherein a first set of the parameters subject to the sparsity constraint are grouped into groups of parameters, the method comprising:
adjusting values of the plurality of parameters by (i) propagating a plurality of training inputs through the plurality of layers of the MT network to generate network outputs, (ii) calculating a value of a loss function based at least in part on differences between the generated network outputs and expected network outputs, and (iii) using the calculated loss function value to modify the values of the first set of parameters and a second set of the parameters subject to the sparsity constraint;
for each parameter of the second set of the parameters subject to the sparsity constraint, determining an accuracy penalty associated with the parameter being set to zero from the adjusted value for the parameter, wherein (i) the parameters are organized by the layers of the MT network, (ii) each parameter has a set of allowed values to which the parameter can be set, (iii) for all parameters, the set of allowed values includes the value zero, and (iv) the sets of allowed values are the same for all parameters within a respective layer of the MT network; for each group of parameters in the first set of parameters, determining a minimum accuracy penalty for each possible number of parameters in the group being set to zero from the adjusted values for the parameters; and
using the determined accuracy penalties to set to the value zero at least the threshold portion of the plurality of adjusted parameters such that the MT network fulfills the sparsity constraint to be executable on a particular type of limited-memory neural network inference circuit having sparsity requirements based on the neural network inference circuit using less memory for parameters set to the value zero than for parameters set to non-zero values.
|