US 12,124,889 B2
Efficient and more advanced implementation of ring-allreduce algorithm for distributed parallel deep learning
Liang Han, San Mateo, CA (US); and Yang Jiao, San Mateo, CA (US)
Assigned to T-Head (Shanghai) Semiconductor Co., Ltd, Shanghai Free Trade Area (CN)
Filed by T-Head (Shanghai) Semiconductor Co., Ltd., Shanghai Free Trade Area (CN)
Filed on Nov. 28, 2022, as Appl. No. 18/059,368.
Application 18/059,368 is a continuation of application No. 16/777,711, filed on Jan. 30, 2020, granted, now 11,520,640.
Prior Publication US 2023/0088237 A1, Mar. 23, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/52 (2006.01); G06F 9/48 (2006.01); G06N 20/00 (2019.01)
CPC G06F 9/52 (2013.01) [G06F 9/4881 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects comprising: a first inter-group interconnect configured to communicatively and directly couple a computing node A of a first group of the plurality of groups with a computing node A of a second group neighboring the first group, a second inter-group interconnect configured to communicatively and directly couple a computing node B of the first group with a computing node B of the second group, a third inter-group interconnect configured to communicatively and directly couple a computing node C of the first group with a computing node C of the second group, and a forth inter-group interconnect configured to communicatively and directly couple a computing node D of the first group with a computing node D of the second group, wherein the second group is aligned with the first group in a first dimension, the method comprising:
syncing data across the first dimension of computing nodes of the first group and the second group using a first set of ring connections, wherein the first set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes of the first group and the second group along the first dimension, and the syncing data across the first dimension comprises transferring, in a unit time, sub-data from a computing node along the first dimension to another computing node via a connection on the first set of ring connections; and
broadcasting synced data across the first dimension of computing nodes using the first set of ring connections.