CPC G06N 3/04 (2013.01) [G06N 3/063 (2013.01); G06N 3/082 (2013.01); G06T 1/20 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)] | 14 Claims |
1. One or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
processing, via a graphics multiprocessor having a single instruction multiple thread (SIMT) architecture, a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format, wherein the executable computer program instructions provide a machine learning framework to provide a library of machine learning primitives to accelerate machine-learning operations, processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an 8-bit integer format having a static precision, wherein quantizing the weights includes:
generating a quantization table to enable non-uniform quantization of the weights, wherein generating the quantization table includes executing a quantization primitive provided by the machine learning framework and the machine learning framework provides a plurality of quantization primitives including a plurality of quantization and dequantization operations, and
quantizing the weights from the floating-point format to the 8-bit integer format using the quantization table; and
performing an inference operation utilising the processed CNN with the weights in the 8-bit integer format.
|