US 12,131,246 B2
Distributed deep learning system, distributed deep learning method, and computing interconnect device
Junichi Kato, Tokyo (JP); Kenji Kawai, Tokyo (JP); Huycu Ngo, Tokyo (JP); Yuki Arikawa, Tokyo (JP); Tsuyoshi Ito, Tokyo (JP); and Takeshi Sakamoto, Tokyo (JP)
Assigned to Nippon Telegraph and Telephone Corporation, Tokyo (JP)
Appl. No. 17/255,209
Filed by Nippon Telegraph and Telephone Corporation, Tokyo (JP)
PCT Filed May 27, 2019, PCT No. PCT/JP2019/020906
§ 371(c)(1), (2) Date Dec. 22, 2020,
PCT Pub. No. WO2020/003849, PCT Pub. Date Jan. 2, 2020.
Claims priority of application No. 2018-119727 (JP), filed on Jun. 25, 2018.
Prior Publication US 2021/0216855 A1, Jul. 15, 2021
Int. Cl. G06N 3/045 (2023.01); G06N 3/063 (2023.01); G06N 5/043 (2023.01)
CPC G06N 3/045 (2023.01) [G06N 3/063 (2013.01); G06N 5/043 (2013.01)] 9 Claims
OG exemplary drawing
 
1. A distributed deep learning system comprising:
a plurality of computing interconnect devices connected with each other through a ring communication network through which communication is possible in one direction; and
a plurality of learning nodes connected with the plurality of computing interconnect devices in a one-to-one relation;
wherein each computing interconnect device of the plurality of computing interconnect devices includes:
a first reception unit configured to receive a first packet from a learning node of the plurality of learning nodes and acquire node data stored in the first packet, the learning node being connected with the computing interconnect device;
a second reception unit configured to receive a second packet from a second computing interconnect device of the plurality of computing interconnect devices that is adjacent to the computing interconnect device and acquire forwarding data stored in the second packet, wherein the second computing interconnect device is upstream from the computing interconnect device;
a first sorting unit configured to sort the forwarding data in accordance with a second reception completion flag included in the second packet and a role allocated to the computing interconnect device in advance, wherein the second reception completion flag indicates completion or incompletion of reception of the second packet;
a second sorting unit configured to sort the node data in accordance with a first reception completion flag included in the first packet and the role, wherein the first reception completion flag indicates completion or incompletion of reception of the first packet;
a first transmission unit configured to in response to the first sorting unit sorting the forwarding data to the first transmission unit or the second sorting unit sorting the node data to the first transmission unit, packetize the node data sorted by the second sorting unit or the forwarding data sorted by the first sorting unit into first packetized data and transmit the first packetized data to a third computing interconnect device adjacent to the computing interconnect device, wherein the third computing interconnect device is downstream from the computing interconnect device; and
a second transmission unit configured to in response to the first sorting unit sorting the forwarding data to the second transmission unit, packetize the forwarding data sorted by the first sorting unit into second packetized data and transmit the second packetized data to the learning node connected to the computing interconnect device;
wherein the first sorting unit is configured to sort the forwarding data to the first transmission unit and the second transmission unit when the second reception completion flag indicates incompletion of reception of the second packet and the role is a parent;
wherein the first sorting unit is configured to discard the forwarding data when the second reception completion flag indicates completion of reception of the second packet and the role is a parent;
wherein the second sorting unit sorts the node data to the first transmission unit when the first reception completion flag indicates incompletion of reception of the first packet and the role is a parent; and
wherein each learning node of the plurality of learning nodes includes:
a neural network configured to output a calculation result upon inputting of learning data;
a third transmission unit configured to packetize data into third packetized data and transmit the third packetized data to the computing interconnect device connected with the learning node;
a third reception unit configured to receive the second packetized data from the computing interconnect device connected with the learning node and acquire the forwarding data stored in the second packetized data when the second packetized data is transmitted to the learning node; and
a configuration parameter update unit configured to update configuration parameter data of the neural network based on the forwarding data.