US 12,271,807 B2
	Convolutional neural network computing method and system based on weight kneading
Xiaowei Li, Beijing (CN); Xin Wei, Beijing (CN); and Hang Lu, Beijing (CN)
Assigned to Institute of Computing Technology, Chinese Academy of Sciences, Beijing (CN)
Appl. No. 17/250,892
Filed by Institute of Computing Technology, Chinese Academy of Sciences, Beijing (CN)
PCT Filed May 21, 2019, PCT No. PCT/CN2019/087767 § 371(c)(1), (2) Date Mar. 19, 2021, PCT Pub. No. WO2020/057160, PCT Pub. Date Mar. 26, 2020.
Claims priority of application No. 201811100309.2 (CN), filed on Sep. 20, 2018.
Prior Publication US 2021/0350214 A1, Nov. 11, 2021
Int. Cl. G06N 3/048 (2023.01); G06F 5/01 (2006.01); G06F 7/50 (2006.01); G06F 7/544 (2006.01); G06F 17/16 (2006.01); G06N 3/04 (2023.01); G06N 3/063 (2023.01); H03M 7/40 (2006.01)

CPC G06N 3/048 (2023.01) [G06F 5/01 (2013.01); G06F 7/50 (2013.01); G06F 7/5443 (2013.01); G06F 17/16 (2013.01); G06N 3/04 (2013.01); G06N 3/063 (2013.01); H03M 7/40 (2013.01); G06F 2207/386 (2013.01)]

10 Claims

1. A convolutional neural network computing system based on weight kneading, comprising:

a weight kneading module for acquiring multiple groups of activations to be operated and corresponding original weights, arranging the original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix to obtain a reduced matrix with vacancies, allowing essential bits in each column of the reduced matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, and placing zeros at vacancies of the intermediate matrix to obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; and

a split accumulation module for obtaining, according to a correspondence relationship between the activations and the essential bits in the original weights, positional information of the activation corresponding to each bit of the kneading weight, sending the kneading weight to a split accumulator, which divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result,

wherein the original weights are 16-bit fixed-point numbers, and the split accumulator comprises splitters for dividing the kneading weight by bit.