US 12,217,167 B2
High performance computing system for deep learning
Minsoo Rhu, Daejeon (KR); and Youngeun Kwon, Daejeon (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Oct. 8, 2019, as Appl. No. 16/595,992.
Claims priority of application No. 10-2018-0120634 (KR), filed on Oct. 10, 2018.
Prior Publication US 2020/0117990 A1, Apr. 16, 2020
Int. Cl. G06N 3/08 (2023.01); G06F 13/28 (2006.01); G06F 15/173 (2006.01); G06N 3/10 (2006.01)
CPC G06N 3/08 (2013.01) [G06F 13/28 (2013.01); G06F 15/17375 (2013.01); G06N 3/10 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A system comprising:
a host processor;
a plurality of accelerators configured to communicate with the host processor based on a communication interface; and
a plurality of memory nodes distinguished from a host memory and is connected with the plurality of accelerators through an interconnection network,
wherein the interconnection network is between the plurality of accelerators and the plurality of memory nodes and comprises data links through which the accelerators directly access the memory nodes without interaction with the host processor and the host memory,
wherein a first accelerator of the plurality of accelerators directly accesses first and second memory nodes of the plurality of memory nodes through respective first and second of the data links of the plurality of data links of the interconnection network,
wherein a second accelerator of the plurality of accelerators, connecting to the first memory node through a third data link, the second accelerator being different than the first accelerator, and
wherein the first accelerator and the second accelerator directly exchange data with each other through the first memory node in which the first accelerator transfers data to the second accelerator through the first memory node, and the second accelerator transfers data to the first accelerator through the first memory node.