US 12,236,341 B2
Bank-balanced-sparse activation feature maps for neural network models
Enxu Yan, Los Altos, CA (US); Dongkuan Xu, Los Altos, CA (US); and Jiachao Liu, Los Altos, CA (US)
Assigned to MOFFETT INTERNATIONAL CO., LIMITED, Kowloon (HK)
Filed by Moffett International Co., Limited, Kowloon (HK)
Filed on Sep. 30, 2020, as Appl. No. 17/038,557.
Prior Publication US 2022/0101118 A1, Mar. 31, 2022
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A method to generate a deep neural network (DNN) model, comprising:
determining a first deep neural network (DNN) model having one or more hidden layers;
determining a bank size, a bank layout, and a target sparsity, the target sparsity specifying a sparsity for a bank of an activation feature map for the one or more hidden layers;
generating a second DNN model based on the first DNN model, wherein a hidden layer of the second DNN model includes a dynamic mask to mask an activation feature map of the hidden layer; and
retraining the second DNN model by:
determining an output tensor corresponding to an activation feature map at a corresponding hidden layer of the second DNN model;
determining a plurality of banks based on the bank size and the bank layout for the output tensor, wherein the output tensor includes the plurality of banks based on the bank size and the bank layout;
computing the dynamic mask based on the output tensor;
applying the dynamic mask to the output tensor by performing component-wise multiplication between the output tensor corresponding to the activation feature map and the mask; and
increasing a sparsity for each of the plurality of banks in the output tensor until the sparsity is equal to or greater than the target sparsity while ensuring the second DNN model converges, wherein the second DNN model includes activations among the plurality of banks at the output tensor such that a number of non-zero elements in each of the plurality of banks is the same and the second DNN model is used for inferencing.