US 12,175,368 B2
Training sparse networks with discrete weight values
Steven L. Teig, Menlo Park, CA (US); and Eric A. Sather, Palo Alto, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Perceive Corporation, San Jose, CA (US)
Filed on Nov. 7, 2022, as Appl. No. 17/982,448.
Application 17/982,448 is a continuation of application No. 15/921,622, filed on Mar. 14, 2018, granted, now 11,537,870.
Claims priority of provisional application 62/627,407, filed on Feb. 7, 2018.
Prior Publication US 2023/0084673 A1, Mar. 16, 2023
Int. Cl. G06N 3/08 (2023.01); G06N 3/084 (2023.01); G06N 7/01 (2023.01); H04L 1/24 (2006.01)
CPC G06N 3/08 (2013.01) [G06N 3/084 (2013.01); G06N 7/01 (2023.01); H04L 1/24 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method for training a neural network comprising a plurality of nodes that use a plurality of weights, wherein each node of a set of the nodes produces a node output value by computing a dot product of weight values for the node and input values for the node that are node output values of previous nodes, the method comprising:
propagating a plurality of inputs through the neural network to generate an output for each of the inputs, wherein each weight of a set of the weights is defined as a probability distribution across a set of allowable values for the weight, wherein for each weight, the set of allowable values for the weight comprises the value zero, a positive value for the weight, and a negation of the positive value for the weight, wherein propagating a particular input through the neural network comprises, for at least a particular node:
computing a node output value probability distribution by computing (i) a mean node output value for the particular node based on a dot product of means of the weight values for the particular node and the input values for the particular node and (ii) a variance for the particular node based on variances of the weight values for the particular node and the input values for the particular node; and
randomly sampling from the computed node output value probability distribution for the particular node to determine the node output value for the particular node; and
using the outputs generated for the plurality of inputs to train the weights.