US 11,727,246 B2
Convolutional neural network optimization mechanism
Liwei Ma, Beijing (CN); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Barath Lakshmanan, Chandler, AZ (US); Ben J. Ashbaugh, Folsom, CA (US); Jingyi Jin, Folsom, CA (US); Jeremy Bottleson, Rancho Cordova, CA (US); Mike B. Macpherson, Portland, OR (US); Kevin Nealis, San Jose, CA (US); Dhawal Srivastava, Phoenix, AZ (US); Joydeep Ray, Folsom, CA (US); Ping T. Tang, Edison, NJ (US); Michael S. Strickland, Sunnyvale, CA (US); Xiaoming Chen, Shanghai (CN); Anbang Yao, Beijing (CN); Tatiana Shpeisman, Menlo Park, CA (US); Altug Koker, El Dorado Hills, CA (US); and Abhishek R. Appu, El Dorado Hills, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Feb. 22, 2019, as Appl. No. 16/283,021.
Application 16/283,021 is a continuation of application No. 15/488,551, filed on Apr. 17, 2017.
Prior Publication US 2019/0188554 A1, Jun. 20, 2019
Int. Cl. G06N 3/04 (2023.01); G06N 3/082 (2023.01); G06N 3/063 (2023.01); G06T 1/20 (2006.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)
CPC G06N 3/04 (2013.01) [G06N 3/063 (2013.01); G06N 3/082 (2013.01); G06T 1/20 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)] 14 Claims
OG exemplary drawing
 
1. One or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
processing, via a graphics multiprocessor having a single instruction multiple thread (SIMT) architecture, a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format, wherein the executable computer program instructions provide a machine learning framework to provide a library of machine learning primitives to accelerate machine-learning operations, processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an 8-bit integer format having a static precision, wherein quantizing the weights includes:
generating a quantization table to enable non-uniform quantization of the weights, wherein generating the quantization table includes executing a quantization primitive provided by the machine learning framework and the machine learning framework provides a plurality of quantization primitives including a plurality of quantization and dequantization operations, and
quantizing the weights from the floating-point format to the 8-bit integer format using the quantization table; and
performing an inference operation utilising the processed CNN with the weights in the 8-bit integer format.