US 11,900,239 B2
Systems and methods for accelerating sparse neural network execution
Zhenyu Gu, Los Altos, CA (US); Liu Liu, Goleta, CA (US); Shuangchen Li, Sunnyvale, CA (US); and Yuan Xie, Sunnyvale, CA (US)
Assigned to Alibaba Group Holding Limited, Grand Cayman (KY)
Filed by ALIBABA GROUP HOLDING LIMITED, Grand Cayman (KY)
Filed on Sep. 5, 2019, as Appl. No. 16/562,376.
Claims priority of provisional application 62/869,484, filed on Jul. 1, 2019.
Prior Publication US 2021/0004665 A1, Jan. 7, 2021
Int. Cl. G06N 3/06 (2006.01); G06N 3/04 (2023.01); G06N 3/063 (2023.01)
CPC G06N 3/06 (2013.01) [G06N 3/04 (2013.01); G06N 3/063 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A system for dynamic sparse execution of a neural network, comprising:
at least one global buffer configured to receive inputs for the neural network;
a plurality of processing elements configured to execute activation functions for nodes of the neural network; and
at least one processor configured to:
execute ternary random projection to reduce at least one dimension of the inputs from the at least one global buffer and generate a corresponding predictable output neuron map for use by the plurality of processing elements, wherein the at least one dimension is reduced according to a tunable degree for indicating a degree at which missing values are to be predicted rather than calculated;
iteratively receive current outputs of a current layer from the plurality of processing elements as inputs of a next layer, wherein at least of one of the current outputs is expanded by setting values corresponding to one or more predictable output neurons;
reduce at least one dimension of the current outputs, and update the corresponding predictable output neuron map, based on the reduced current outputs, for use by the plurality of processing elements in generating next outputs until the plurality of processing elements have executed each layer of the neural network.