CPC G06N 3/10 (2013.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01)] | 17 Claims |
1. A method of configuring field-programmable gate array apparatus to implement an artificial neural network having a succession of interconnected neuron layers and a plurality of sets of weights, each associated with a respective neuron layer, for weighting output signals of those neuron layers, the method comprising:
processing a trained model of the neural network, in which said weights are defined in a floating-point format, to quantize each set of weights to a respective reduced-precision format in dependence on effect of the quantization on accuracy of the model, wherein quantization is performed for each layer of the neuron layers as a progressive process in dependence on contribution of that layer to overall precision of the model;
for each set of weights, defining a partitioning scheme for a set of block memories of said apparatus such that a plurality k of those weights can be stored in each addressable location of the set of memories, wherein k differs for different sets of weights; and
programming said apparatus to implement the neural network such that weights in each set are persistently stored in a set of said block memories partitioned according to said partitioning scheme for that set of weights, wherein,
after quantizing each set of weights to said reduced-precision format:
selecting a minority subset of that set comprising weights whose values exceed a threshold bit-width for that set;
defining said partitioning scheme in dependence on said threshold bit-width such that a plurality of weights in the remaining, majority subset of that set can be stored in each said addressable location; and
in programming said apparatus, storing said majority subset of the weights in said set of partitioned block memories and storing said minority subset of the weights in a set of registers of the apparatus,
wherein the threshold bit-width for that set is defined based on a clustering optimization performed on that set.
|