US 12,373,911 B2
Compute optimizations for low precision machine learning operations
Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Sara S. Baghsorkhi, San Jose, CA (US); Anbang Yao, Beijing (CN); Kevin Nealis, San Jose, CA (US); Xiaoming Chen, Shanghai (CN); Altug Koker, El Dorado Hills, CA (US); Abhishek R. Appu, El Dorado Hills, CA (US); John C. Weast, Portland, OR (US); Mike B. Macpherson, Portland, OR (US); Dukhwan Kim, San Jose, CA (US); Linda L. Hurd, Cool, CA (US); Ben J. Ashbaugh, Folsom, CA (US); Barath Lakshmanan, Chandler, AZ (US); Liwei Ma, Beijing (CN); Joydeep Ray, Folsom, CA (US); Ping T. Tang, Edison, NJ (US); and Michael S. Strickland, Sunnyvale, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Aug. 25, 2023, as Appl. No. 18/456,235.
Application 18/456,235 is a continuation of application No. 17/978,573, filed on Nov. 1, 2022, granted, now 11,948,224.
Application 17/978,573 is a continuation of application No. 17/960,611, filed on Oct. 5, 2022, granted, now 12,148,063.
Application 17/960,611 is a continuation of application No. 17/720,804, filed on Apr. 14, 2022, granted, now 11,468,541, issued on Oct. 11, 2022.
Application 17/720,804 is a continuation of application No. 16/983,080, filed on Aug. 3, 2020, granted, now 11,308,574, issued on Apr. 19, 2022.
Application 16/983,080 is a continuation of application No. 16/446,265, filed on Jun. 19, 2019, granted, now 11,138,686, issued on Oct. 5, 2021.
Application 16/446,265 is a continuation of application No. 16/197,821, filed on Nov. 21, 2018, granted, now 10,853,906, issued on Dec. 1, 2020.
Application 16/197,821 is a continuation of application No. 15/789,565, filed on Oct. 20, 2017, granted, now 10,242,423, issued on Mar. 26, 2019.
Application 15/789,565 is a continuation of application No. 15/581,167, filed on Apr. 28, 2017, granted, now 10,726,514, issued on Jul. 28, 2020.
Prior Publication US 2023/0401668 A1, Dec. 14, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 1/20 (2006.01); G06F 3/14 (2006.01); G06F 7/483 (2006.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06F 9/50 (2006.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 3/084 (2023.01); G06N 20/00 (2019.01); G06T 1/60 (2006.01); G06T 15/00 (2011.01)
CPC G06T 1/20 (2013.01) [G06F 7/483 (2013.01); G06F 9/30014 (2013.01); G06F 9/30185 (2013.01); G06F 9/3863 (2013.01); G06F 9/5044 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01); G06N 20/00 (2019.01); G06F 3/14 (2013.01); G06T 1/60 (2013.01); G06T 15/005 (2013.01)] 25 Claims
OG exemplary drawing
 
1. A graphics processor comprising:
a memory device;
a compressor to compress data to be written to the memory device; and
a streaming multiprocessor coupled with the memory device, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to concurrently execute multiple threads, including a first thread in parallel with a second thread,
wherein the first thread is configured to process a first instruction to cause a first portion of the streaming multiprocessor to perform a floating-point operation on multiple floating-point input operands,
wherein the second thread is configured to process a second instruction to cause a second portion of the streaming multiprocessor to perform an integer operation on multiple integer operands,
wherein the streaming multiprocessor is to perform operations for a third instruction, the streaming multiprocessor to perform a first operation of the third instruction on 16-bit floating-point input and a second operation of the third instruction on input that includes a 32-bit floating-point input,
wherein the streaming multiprocessor is to perform operations for a fourth instruction, the streaming multiprocessor to perform a third operation on 8-bit integer input and a fourth operation on input that includes a 32-bit integer input, and
wherein the first operation of the fourth instruction includes a multiply and the second operation of the fourth instruction includes an accumulate.