US 11,693,658 B2
	Compute optimizations for neural networks using ternary weight
Kevin Nealis, San Jose, CA (US); Anbang Yao, Beijing (CN); Xiaoming Chen, Shanghai (CN); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Sara S. Baghsorkhi, San Jose, CA (US); Eriko Nurvitadhi, Hillsboro, OR (US); Balaji Vembu, Folsom, CA (US); Nicolas C. Galoppo Von Borries, Portland, OR (US); Rajkishore Barik, Santa Clara, CA (US); Tsung-Han Lin, Campbell, CA (US); and Kamal Sinha, Cordova, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jul. 26, 2021, as Appl. No. 17/443,376.
Application 17/443,376 is a continuation of application No. 16/505,012, filed on Jul. 8, 2019, granted, now 11,074,072.
Application 16/505,012 is a continuation of application No. 15/494,710, filed on Apr. 24, 2017, granted, now 10,410,098, issued on Sep. 10, 2019.
Prior Publication US 2021/0373886 A1, Dec. 2, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06N 3/063 (2023.01); G06N 3/084 (2023.01); G06T 1/20 (2006.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)

CPC G06F 9/3001 (2013.01) [G06F 9/3851 (2013.01); G06F 9/3887 (2013.01); G06F 9/3893 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01); G06T 1/20 (2013.01); G06F 2207/4824 (2013.01)]

20 Claims

1. A compute apparatus comprising:

a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a ternary weight associated with a neural network, wherein the ternary weight represents a weight value of one of positive one, zero, and negative one; and

an arithmetic logic unit including a multiplier, an adder, and an accumulator register, wherein to execute the decoded instruction, the multiplier is to perform a multiplication operation on the multi-bit input value based on the ternary weight to generate an intermediate product and the adder is to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.