US 12,423,582 B2
	Apparatus and methods for training in convolutional neural networks
Yunji Chen, Beijing (CN); Tian Zhi, Beijing (CN); Shaoli Liu, Beijing (CN); Qi Guo, Beijing (CN); and Tianshi Chen, Beijing (CN)
Assigned to CAMBRICON TECHNOLOGIES CORPORATION LIMITED, Beijing (CN)
Filed by Cambricon Technologies Corporation Limited, Beijing (CN)
Filed on Dec. 11, 2019, as Appl. No. 16/709,968.
Application 16/709,968 is a continuation of application No. 16/174,165, filed on Oct. 29, 2018, granted, now 10,643,129.
Application 16/174,165 is a continuation in part of application No. PCT/CN2016/081088, filed on May 5, 2016.
Claims priority of application No. 201610283838.5 (CN), filed on Apr. 29, 2016.
Prior Publication US 2020/0111007 A1, Apr. 9, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/084 (2023.01); G06F 7/544 (2006.01); G06N 3/08 (2023.01)

CPC G06N 3/084 (2013.01) [G06F 7/5443 (2013.01); G06N 3/08 (2013.01)]

20 Claims

1. An apparatus for backpropagation of a convolutional neural network, comprising:

a controller circuit configured to receive an instruction; and

a computation circuit that includes a master computation circuit and a plurality of slave computation circuits, wherein the master computation circuit and the plurality of slave computation circuits are connected via an interconnection circuit,

wherein each of the computation circuit, the master computation circuit, the plurality of slave computation circuits, and the interconnection circuit is an application specific integrated circuit, and

wherein the master computation circuit is configured to:

receive input data,

divide the input data into a plurality of portions, and

select a portion of the plurality of portions of the input data based on a predetermined convolution window, associated with a respective slave computation circuit of the plurality of slave computation circuits, in response to the instruction,

wherein each slave computation circuit of the plurality of slave computation circuits is configured to respectively convolute the selected portion of the plurality of portions of the input data with one of one or more respectively calculated first data gradients to generate a kernel gradient,

wherein the master computation circuit is further configured to update a prestored convolution kernel based on the kernel gradient,

wherein the master computation circuit is further configured to calculate one or more second data gradients based on a derivative of an activation function and a sum of one or more multiplication results between the first data gradients and the portion of the prestored convolution kernel, and

wherein the activation function is a function selected from the group consisting a sigmoid function, a tanh function, a relu function, and a softmax function.