US 12,175,356 B2
Arithmetic device
Kyohei Suwabe, Tokyo (JP); Kenya Sugihara, Tokyo (JP); and Seidai Takeda, Tokyo (JP)
Assigned to Mitsubishi Electric Corporation, Tokyo (JP)
Appl. No. 17/049,065
Filed by Mitsubishi Electric Corporation, Tokyo (JP)
PCT Filed Jan. 28, 2019, PCT No. PCT/JP2019/002780
§ 371(c)(1), (2) Date Oct. 20, 2020,
PCT Pub. No. WO2019/220692, PCT Pub. Date Nov. 21, 2019.
Claims priority of application No. 2018-093750 (JP), filed on May 15, 2018.
Prior Publication US 2021/0241083 A1, Aug. 5, 2021
Int. Cl. G06F 7/50 (2006.01); G06F 3/048 (2013.01); G06F 7/523 (2006.01); G06F 7/544 (2006.01); G06N 3/048 (2023.01); G06N 3/063 (2023.01)
CPC G06N 3/063 (2013.01) [G06F 7/50 (2013.01); G06F 7/523 (2013.01); G06F 7/5443 (2013.01); G06N 3/048 (2023.01)] 9 Claims
OG exemplary drawing
 
1. An arithmetic device for a computation in a convolutional layer of a convolutional neural network, the convolutional layer including a plurality of input neurons and a plurality of output neurons each connected to at least one of the plurality of input neurons, the arithmetic device comprising:
a first register that stores input data as a value of at least one of the plurality of input neurons;
a plurality of ports; and
a plurality of processing element groups that correspond to the plurality of ports, respectively, and can access the first register through respective corresponding ports,
wherein the arithmetic device is configured to compare a number of output neurons within the range of one convolutional kernel to a total number of processing elements, and divide the number of output neurons into at least one turn when the number of output neurons is greater than the total number of processing elements,
wherein each of the processing element groups includes a plurality of processing elements, and each of the processing elements is associated with at least one of the plurality of output neurons within a range of one convolutional kernel and performs a multiply-and-accumulate computation for the at least one associated output neuron in which a value of at least one input neuron connected to the at least one associated output neuron is multiplied by a weight coefficient corresponding to a turn number of the at least one turn and a result of multiplication is accumulated, and
wherein each of the processing element groups is configured send a data access request to the first register to access one or more of the plurality of input neurons stored in the first register, the data request including an index string corresponding to the turn number of the at least one turn, and the index string including an index number of one or more of the plurality of input neurons.