US 11,907,843 B2
	Importance-aware model pruning and re-training for efficient convolutional neural networks
Anbang Yao, Beijing (CN); Yiwen Guo, Beijing (CN); and Yurong Chen, Beijing (CN)
Assigned to Intel Corporation, Santa Clara, CA (US)
Appl. No. 16/305,626
Filed by Intel Corporation, Santa Clara, CA (US)
PCT Filed Jun. 30, 2016, PCT No. PCT/CN2016/087859 § 371(c)(1), (2) Date Nov. 29, 2018, PCT Pub. No. WO2018/000309, PCT Pub. Date Jan. 4, 2018.
Prior Publication US 2020/0334537 A1, Oct. 22, 2020
Int. Cl. G06N 3/082 (2023.01); G06F 18/241 (2023.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)

CPC G06N 3/082 (2013.01) [G06F 18/241 (2023.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)]

25 Claims

1. An apparatus comprising:

a memory; and

hardware logic coupled to the memory, wherein the hardware logic is implemented in one or more of configurable logic or fixed-functionality logic hardware, the hardware logic coupled to the memory to:

conduct an importance measurement of a plurality of parametric generators in a trained neural network with one or more comparators that compare two or more parameter values that contain covariance matrix information, wherein the covariance matrix information is to indicate that first values of an inverse covariance matrix are greater than second values of the inverse covariance matrix, wherein the first values correspond to a first subset of a plurality of parameters and the second values correspond to a second subset of the plurality of parameters;

set the first subset of the plurality of parameters to zero and bypass a modification of the second subset of the plurality of parameters to zero based on the first values being greater than the second values to obtain a pruned neural network, wherein one or more parameters in the first subset is to be less than one or more of the second subset of the plurality of parameters;

re-train the pruned neural network; and

iteratively conduct the importance measurement, wherein the hardware logic coupled to the memory is to iteratively set the first subset of the plurality of parameters to zero and iteratively re-train the pruned neural network until the pruned neural network satisfies a sparsity condition.