US 12,265,905 B2
	Computation of neural network node with large input values
Jung Ko, San Jose, CA (US); Kenneth Duong, San Jose, CA (US); and Steven L. Teig, Menlo Park, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Nov. 9, 2022, as Appl. No. 17/984,228.
Application 17/984,228 is a continuation of application No. 16/212,645, filed on Dec. 6, 2018, granted, now 11,531,727.
Claims priority of provisional application 62/773,162, filed on Nov. 29, 2018.
Claims priority of provisional application 62/773,164, filed on Nov. 29, 2018.
Claims priority of provisional application 62/753,878, filed on Oct. 31, 2018.
Claims priority of provisional application 62/742,802, filed on Oct. 8, 2018.
Claims priority of provisional application 62/724,589, filed on Aug. 29, 2018.
Claims priority of provisional application 62/660,914, filed on Apr. 20, 2018.
Prior Publication US 2023/0076850 A1, Mar. 9, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 7/544 (2006.01); G06F 1/03 (2006.01); G06F 5/01 (2006.01); G06F 9/30 (2018.01); G06F 17/10 (2006.01); G06F 17/16 (2006.01); G06N 3/048 (2023.01); G06N 3/06 (2006.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01); G06N 3/084 (2023.01); G06N 5/04 (2023.01); G06N 5/046 (2023.01); G06N 20/00 (2019.01)

CPC G06N 3/063 (2013.01) [G06F 1/03 (2013.01); G06F 5/01 (2013.01); G06F 7/5443 (2013.01); G06F 9/30098 (2013.01); G06F 9/30145 (2013.01); G06F 17/10 (2013.01); G06F 17/16 (2013.01); G06N 3/048 (2023.01); G06N 3/06 (2013.01); G06N 3/08 (2013.01); G06N 3/084 (2013.01); G06N 5/04 (2013.01); G06N 5/046 (2013.01); G06N 20/00 (2019.01)]

19 Claims

1. For a neural network inference circuit that executes a neural network comprising a plurality of computation nodes, the neural network inference circuit comprising a plurality of dot product cores for computing partial dot products, each of a set of the plurality of computation nodes comprising a dot product of input values and ternary weight values, a method for computing an output value for a particular computation node, the method comprising:

at each core of a set of the plurality of dot product cores of the neural network inference circuit, loading (i) data for a set of ternary weight values for the particular computation node into a weight value buffer of the core, (ii) a first portion of each input value of a set of input values for the particular computation node into a first input value buffer of the core, and (iii) a second portion of each of the input values into a second input value buffer of the core;

at a set of dot product computation circuits of the neural network inference circuit, the set of dot product computation circuits comprising a partial dot product computation circuit of each core of the set of dot product cores:

computing a first dot product between the set of ternary weight values from the weight value buffers of the set of cores and the first portion of each of the input values from the first input value buffers of the set of cores;

computing a second dot product between the set of ternary weight values and the second portion of each of the input values from the second input value buffers of the set of cores;

bit-shifting the second dot product to generate a bit-shifted second dot product; and

adding the first dot product with the bit-shifted second dot product to generate a computed dot product for the particular computation node; and

at a set of post-processing circuits of the neural network inference circuit, performing a set of post-processing operations to compute the output value for the particular computation node from the computed dot product for the particular computation node.