US 12,205,036 B2
	Apparatus and methods for training in fully connected layers of convolutional networks
Qi Guo, Beijing (CN); Shijin Zhang, Beijing (CN); Yunji Chen, Beijing (CN); and Tianshi Chen, Beijing (CN)
Assigned to CAMBRICON TECHNOLOGIES CORPORATION LIMITED, Beijing (CN)
Filed by Cambricon Technologies Corporation Limited, Beijing (CN)
Filed on Oct. 29, 2018, as Appl. No. 16/174,050.
Application 16/174,050 is a continuation in part of application No. PCT/CN2016/081114, filed on May 5, 2016.
Claims priority of application No. 201610285062.0 (CN), filed on Apr. 29, 2016.
Prior Publication US 2019/0065958 A1, Feb. 28, 2019
Int. Cl. G06N 3/084 (2023.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01)

CPC G06N 3/084 (2013.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01)]

10 Claims

1. An integrated circuit (IC) chip for backpropagation in a fully connected layer of a neural network, comprising:

a controller circuit configured to receive an instruction; and

one or more computation circuits that include:

a master computation circuit,

one or more slave computation circuits, and

an interconnection circuit communicatively connected to the master computation circuit and the one or more slave computation circuits,

wherein the master computation circuit configured to

receive input data and one or more first data gradients in response to the instruction, and

transmit the input data and the one or more first data gradients to the one or more slave computation circuits, and

wherein the one or more slave computation circuits are respectively configured to multiply one of the one or more first data gradients with the input data to generate a default weight gradient vector,

wherein the master computation circuit is further configured to update one or more weight values based on the default weight gradient vector,

wherein the master computation circuit is further configured to apply a derivative of an activation function to the one or more first data gradients to generate one or more input gradients,

wherein the one or more slave computation circuits are respectively configured to multiply one of the one or more input gradients with one or more weight vectors in a weight matrix to generate one or more multiplication results, and

wherein the interconnection circuit is configured to combine the one or more multiplication results of a lower dimension into an output gradient vector of a higher dimension.