US 11,748,601 B2
	Integrated circuit chip device
Shaoli Liu, Beijing (CN); Xinkai Song, Beijing (CN); Bingrui Wang, Beijing (CN); Yao Zhang, Beijing (CN); and Shuai Hu, Beijing (CN)
Assigned to CAMBRICON TECHNOLOGIES CORPORATION LIMITED, Beijing (CN)
Filed by CAMBRICON TECHNOLOGIES CORPORATION LIMITED, Beijing (CN)
Filed on Dec. 27, 2020, as Appl. No. 17/134,444.
Application 17/134,444 is a continuation of application No. 16/903,304, filed on Jun. 16, 2020, granted, now 11,544,546.
Application 16/903,304 is a continuation of application No. PCT/CN2018/123929, filed on Dec. 26, 2018.
Claims priority of application No. 201711455388.4 (CN), filed on Dec. 27, 2017; application No. 201711455397.3 (CN), filed on Dec. 27, 2017; application No. 201711466943.3 (CN), filed on Dec. 28, 2017; application No. 201711468629.9 (CN), filed on Dec. 28, 2017; application No. 201711469408.3 (CN), filed on Dec. 28, 2017; application No. 201711469614.4 (CN), filed on Dec. 28, 2017; and application No. 201711469615.9 (CN), filed on Dec. 28, 2017.
Prior Publication US 2021/0150324 A1, May 20, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/063 (2023.01); G06N 3/04 (2023.01)

CPC G06N 3/063 (2013.01) [G06N 3/04 (2013.01)]

20 Claims

1. An integrated circuit chip device for training a neural network having n layers, n being an integer greater than or equal to 2, wherein the integrated circuit chip device comprises:

a main processing circuit; and

a plurality of basic processing circuits;

wherein:

the main processing circuit comprises a data type conversion circuit configured to convert data between a floating point data type and a fixed point data type;

the integrated circuit chip device is configured to:

receive a training instruction;

determine input data and weight group data of a first layer according to the training instruction; and

perform a forward computation of an i^thlayer of the neural network on the input data and the weight group data of the first layer to obtain an i^thoutput result of the forward computation, i being an integer greater than or equal to 1 and smaller than or equal to n;

the main processing circuit is further configured to:

obtain an i^thoutput result gradient according to the i^thoutput result;

obtain an i^thbackward computation of backward computations of the i^thlayer according to the training instruction;

obtain an i^thbackward computation complexity according to the i^thoutput result gradient, input data of the i^thlayer, weight group data of the i^thlayer, and the i^thbackward computation;

determine an i^thback data type corresponding to the i^thoutput result gradient, the input data of the i^thlayer, and the weight group data of the i^thlayer according to the i^thbackward computation complexity; and

classify the i^thoutput result gradient, the input data of the i^thlayer, and the weight group data of the i^thlayer into a broadcasting data block and a distribution data block according to a type of the i^thbackward computation;

at least one of the plurality of basic processing circuits is configured to:

perform computations on the broadcasting data block of the i^thback data type and received basic data blocks of the i^thback data type to obtain computation results; and

transfer the computation results to the main processing circuit;

the main processing circuit is further configured to:

process the computation results to obtain a weight group gradient of the i^thlayer and an input data gradient of the i^thlayer; and

update the weight group data of the i^thlayer according to the weight group gradient of the i^thlayer, wherein the i^thback data type includes a fixed point type or a floating point type;

the integrated circuit device is further configured to:

perform backward computations of an (i−1)^thlayer using the input data gradient of the i^thlayer as an (i−1)^thoutput result gradient of the (i−1)^thlayer to obtain a weight group gradient of the (i−1)^thlayer; and

update weight group data of a corresponding layer according to the weight group gradient of the (i−1)^thlayer, wherein the weight group data includes at least two weights; and

the main processing circuit is further configured to:

when the i^thbackward computation is a multiplication computation, classify both the input data of the i^thlayer and the weight group data of the i^thlayer into distribution data blocks, and the i^thoutput result gradient as a broadcasting data block; and

when the i^thbackward computation is a convolution computation, classify both the input data of the i^thlayer and the weight group data of the i^thlayer into broadcasting data blocks, and the i^thoutput result gradient into a distribution data block.