US 11,783,200 B2
Artificial neural network implementation in field-programmable gate arrays
Dionysios Diamantopoulos, Thalwil (CH); Heiner Giefers, Langnau am Albis (CH); and Christoph Hagleitner, Wallisellen (CH)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Feb. 8, 2019, as Appl. No. 16/270,778.
Prior Publication US 2020/0257986 A1, Aug. 13, 2020
Int. Cl. G06N 3/04 (2023.01); G06N 3/10 (2006.01); G06N 3/08 (2023.01)
CPC G06N 3/10 (2013.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method of configuring field-programmable gate array apparatus to implement an artificial neural network having a succession of interconnected neuron layers and a plurality of sets of weights, each associated with a respective neuron layer, for weighting output signals of those neuron layers, the method comprising:
processing a trained model of the neural network, in which said weights are defined in a floating-point format, to quantize each set of weights to a respective reduced-precision format in dependence on effect of the quantization on accuracy of the model, wherein quantization is performed for each layer of the neuron layers as a progressive process in dependence on contribution of that layer to overall precision of the model;
for each set of weights, defining a partitioning scheme for a set of block memories of said apparatus such that a plurality k of those weights can be stored in each addressable location of the set of memories, wherein k differs for different sets of weights; and
programming said apparatus to implement the neural network such that weights in each set are persistently stored in a set of said block memories partitioned according to said partitioning scheme for that set of weights, wherein,
after quantizing each set of weights to said reduced-precision format:
selecting a minority subset of that set comprising weights whose values exceed a threshold bit-width for that set;
defining said partitioning scheme in dependence on said threshold bit-width such that a plurality of weights in the remaining, majority subset of that set can be stored in each said addressable location; and
in programming said apparatus, storing said majority subset of the weights in said set of partitioned block memories and storing said minority subset of the weights in a set of registers of the apparatus,
wherein the threshold bit-width for that set is defined based on a clustering optimization performed on that set.