US 12,450,481 B2
Distributed processing computer and distributed deep learning system
Kenji Tanaka, Tokyo (JP); Yuki Arikawa, Tokyo (JP); Tsuyoshi Ito, Tokyo (JP); Kazuhiko Terada, Tokyo (JP); and Takeshi Sakamoto, Tokyo (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 17/777,217
Filed by NTT, Inc., Tokyo (JP)
PCT Filed Dec. 2, 2019, PCT No. PCT/JP2019/046966
§ 371(c)(1), (2) Date May 16, 2022,
PCT Pub. No. WO2021/111490, PCT Pub. Date Jun. 10, 2021.
Prior Publication US 2022/0391701 A1, Dec. 8, 2022
Int. Cl. G06N 3/08 (2023.01)
CPC G06N 3/08 (2013.01) 12 Claims
OG exemplary drawing
 
1. A computer for distributed processing, comprising:
a plurality of accelerators having a part of a neural network assigned to each thereof, each of the plurality of accelerators being configured to derive a learning result based on input data and update each parameter value included in the part of the neural network by using the learning result;
a plurality of network interface circuits, each being configured to transmit and receive information on learning including the learning result; and
an arithmetic processing unit configured to control the plurality of accelerators and the plurality of network interface circuits so as to cause each of the plurality of accelerators to derive a learning result based on input data and so as to cause the plurality of network interface circuits to transmit and receive, in parallel, information on learning including the learning result derived by each of the plurality of accelerators, wherein
the plurality of accelerators perform, in pipelines, feed forward processing and back propagation processing, for a mini batch including a plurality of pieces of input data, based on each of the plurality of pieces of input data, thereby deriving a plurality of learning results based on each of the plurality of pieces of input data; and
the plurality of network interface circuits sequentially transmit and receive information on learning with the back propagation processing overlapped each time the plurality of accelerators derive the plurality of learning results based on the plurality of pieces of input data included in the mini batch, the information on learning including the learning results.