US 11,948,224 B2
Compute optimizations for low precision machine learning operations
Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Sara S. Baghsorkhi, San Jose, CA (US); Anbang Yao, Beijing (CN); Kevin Nealis, San Jose, CA (US); Xiaoming Chen, Shanghai (CN); Altug Koker, El Dorado Hills, CA (US); Abhishek R. Appu, El Dorado Hills, CA (US); John C. Weast, Portland, OR (US); Mike B. Macpherson, Portland, OR (US); Dukhwan Kim, San Jose, CA (US); Linda L. Hurd, Cool, CA (US); Ben J. Ashbaugh, Folsom, CA (US); Barath Lakshmanan, Chandler, AZ (US); Liwei Ma, Beijing (CN); Joydeep Ray, Folsom, CA (US); Ping T. Tang, Edison, NJ (US); and Michael S. Strickland, Sunnyvale, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Nov. 1, 2022, as Appl. No. 17/978,573.
Application 17/978,573 is a continuation of application No. 17/960,611, filed on Oct. 5, 2022.
Application 17/960,611 is a continuation of application No. 17/720,804, filed on Apr. 14, 2022, granted, now 11,468,541, issued on Oct. 11, 2022.
Application 17/720,804 is a continuation of application No. 16/983,080, filed on Aug. 3, 2020, granted, now 11,308,574, issued on Apr. 19, 2022.
Application 16/983,080 is a continuation of application No. 16/446,265, filed on Jun. 19, 2019, granted, now 11,138,686, issued on Oct. 5, 2021.
Application 16/446,265 is a continuation of application No. 16/197,821, filed on Nov. 21, 2018, granted, now 10,853,906, issued on Dec. 1, 2020.
Application 16/197,821 is a continuation of application No. 15/789,565, filed on Oct. 20, 2017, granted, now 10,242,423, issued on Mar. 26, 2019.
Application 15/789,565 is a continuation of application No. 15/581,167, filed on Apr. 28, 2017, granted, now 10,726,514, issued on Jul. 28, 2020.
Prior Publication US 2023/0061670 A1, Mar. 2, 2023
Int. Cl. G06T 1/20 (2006.01); G06F 3/14 (2006.01); G06F 7/483 (2006.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06F 9/50 (2006.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01); G06N 3/084 (2023.01); G06N 20/00 (2019.01); G06T 1/60 (2006.01); G06T 15/00 (2011.01)
CPC G06T 1/20 (2013.01) [G06F 7/483 (2013.01); G06F 9/30014 (2013.01); G06F 9/30185 (2013.01); G06F 9/3863 (2013.01); G06F 9/5044 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01); G06N 20/00 (2019.01); G06F 3/14 (2013.01); G06T 1/60 (2013.01); G06T 15/005 (2013.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus, comprising:
a memory stack including multiple memory dies; and
a parallel processor including a plurality of multiprocessors, each multiprocessor having a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces, at least one multiprocessor comprising:
a multiply-accumulate circuit to perform operations including multiply-accumulate operations on matrix data in a first layer of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, wherein the multiply-accumulate operations include a matrix multiplication operation with values smaller than 32 bits and an accumulate operation having a 32-bit input;
precision tracking logic to dynamically evaluate metrics associated with the matrix data elements and, based on the metrics, to indicate if an optimization is to be performed to represent data at a second layer of the neural network implementation, wherein the optimization includes to apply a numerical transform operation to the data to enable representation of the data at the second layer at a second precision; and
a numerical transform unit to dynamically perform the numerical transform operation on the matrix data elements, based on the indication, to produce transformed matrix data elements at the second precision, wherein the second precision is lower than the first precision.