US 12,248,877 B2
	Hybrid neural network pruning
Xiaofan Xu, Dublin (IE); Mi Sun Park, San Jose, CA (US); and Cormac M. Brick, San Francisco, CA (US)
Assigned to Movidius Ltd., Schiphol-Rijk (NL)
Filed by Movidius Ltd., Schiphol-Rijk (NL)
Filed on Dec. 19, 2018, as Appl. No. 16/226,521.
Claims priority of provisional application 62/675,601, filed on May 23, 2018.
Prior Publication US 2019/0362235 A1, Nov. 28, 2019
Int. Cl. G06N 3/082 (2023.01); G06N 3/04 (2023.01); G06N 3/044 (2023.01); G06N 20/00 (2019.01); G06T 7/70 (2017.01)

CPC G06N 3/082 (2013.01) [G06N 3/04 (2013.01); G06N 3/044 (2023.01); G06T 7/70 (2017.01); G06N 20/00 (2019.01); G06T 2207/10024 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/20081 (2013.01)]

18 Claims

1. At least one non-transitory machine accessible storage medium having instructions stored thereon, wherein the instructions when executed on a machine, cause the machine to prune a neural network by:

accessing data comprising a definition of the neural network, wherein the neural network comprises a layer having a plurality of channels;

generating a thinned version of the layer, wherein generating the thinned version of the layer comprises:

ranking the plurality of channels of the layer based on weight values of each of the plurality of channels,

selecting one or more channels from the plurality of channels based on the ranking,

pruning the layer by setting values of all weights in the selected one or more channels to zeros to form the thinned version of the layer, and

rounding a number of unpruned channels to a multiple corresponding to a hardware architecture;

providing input data to a thinned version of the neural network that includes the thinned version of the layer;

determining, based on an output of the neural network, that an accuracy of the thinned version of the neural network exceeds a threshold accuracy;

after determining that the accuracy of the thinned version of the neural network exceeds the threshold accuracy, further pruning the thinned version of the layer by modifying a weight in another channel of the layer to zero and keeping another weight in the another channel of the layer unmodified to form a further thinned version of the layer; and

generating a pruned version of the neural network that comprises the further thinned version of the layer.