US 12,229,668 B2
	Operation method and apparatus for network layer in deep neural network
Yuan Zhang, Hangzhou (CN); Di Xie, Hangzhou (CN); and Shiliang Pu, Hangzhou (CN)
Assigned to HANGZHOU HIKVISION DIGITAL TECHNOLOGY CO., LTD., Hangzhou (CN)
Appl. No. 17/254,002
Filed by HANGZHOU HIKVISION DIGITAL TECHNOLOGY CO., LTD., Hangzhou (CN)
PCT Filed Jun. 24, 2019, PCT No. PCT/CN2019/092553 § 371(c)(1), (2) Date Dec. 18, 2020, PCT Pub. No. WO2020/004101, PCT Pub. Date Jan. 2, 2020.
Claims priority of application No. 201810679580.X (CN), filed on Jun. 27, 2018.
Prior Publication US 2021/0271973 A1, Sep. 2, 2021
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01)

CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)]

15 Claims

1. An operation method for a network layer in a Deep Neural Network, wherein the method is applied to target detection and segmentation, behavior detection and recognition or voice recognition and performed by an electronic device, wherein the electronic device includes a processor and a machine-readable storage medium, the processor of the electronic device, by reading machine-executable instructions stored in the machine-readable medium and executing the machine-executable instructions, implements the following operations comprising:

acquiring a weighted tensor of the network layer in the Deep Neural Network, wherein the weighted tensor comprises a plurality of filters;

for each of the filters of the network layer, converting the filter into a linear combination of a plurality of fixed-point convolution kernels by splitting the filter, wherein a weight value of each of the fixed-point convolution kernels is a fixed-point quantized value having a specified bit-width;

for each of filters of the network layer, performing a convolution operation on input data of the network layer and each of the fixed-point convolution kernels of the filter, respectively, to obtain a plurality of convolution results, and calculating a weighted sum of the obtained convolution results based on the linear combination of the plurality of fixed-point convolution kernels of the filter to obtain an operation result of the filter; and

determining output data of the network layer, which is composed of the obtained operation results of the filters;

wherein, the network layer comprises a convolution layer, and the size of the weighted tensor of the convolution layer is S×S×I×O, wherein O represents the number of filters included in the weighted tensor of the convolutional layer;

for each of the filters of the network layer, converting the filter into the linear combination of the plurality of fixed-point convolution kernels by splitting the filter comprises:

for each filter of the convolution layer, converting the filter into the linear combination of the plurality of fixed-point convolution kernels by the splitting the filter based on a preset splitting formula; wherein, the preset splitting formula is:

wherein, wⁱis an i^thfilter of the convolution layer, i∈[1,O], p is the number of the fixed-point convolution kernels obtained by splitting the filter wⁱ, α_jis a preset linear weighting coefficient of a j^thfixed-point convolution kernel, t_jis the j^thfixed-point convolution kernel and the size of t_jis S×S×I, B is the preset quantized number of bits, and b_jis a specified bit-width corresponding to the j^thfixed-point convolution kernel.