US 12,289,213 B2
Data synchronization method and apparatus, and device and storage medium
Fang Cao, Shandong (CN); Zhenhua Guo, Shandong (CN); Li Wang, Shandong (CN); Kai Gao, Shandong (CN); Yaqian Zhao, Shandong (CN); and Rengang Li, Shandong (CN)
Assigned to IEIT SYSTEMS CO., LTD., Shandong (CN)
Appl. No. 18/721,620
Filed by IEIT SYSTEMS CO., LTD., Shandong (CN)
PCT Filed Nov. 15, 2022, PCT No. PCT/CN2022/132053
§ 371(c)(1), (2) Date Jun. 18, 2024,
PCT Pub. No. WO2023/207035, PCT Pub. Date Nov. 2, 2023.
Claims priority of application No. 202210468218.4 (CN), filed on Apr. 29, 2022.
Prior Publication US 2024/0422067 A1, Dec. 19, 2024
Int. Cl. H04L 41/12 (2022.01); G06F 13/40 (2006.01); G06F 13/42 (2006.01); G06N 3/08 (2023.01); G11B 27/10 (2006.01); H04L 49/90 (2022.01)
CPC H04L 41/12 (2013.01) [G06F 13/4009 (2013.01); G06F 13/4221 (2013.01); G11B 27/10 (2013.01); G06F 2213/0026 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A data synchronization method, comprising:
constructing first level physical topologies of ring structures having a quantity consistent with a quantity of types of acceleration devices among acceleration devices of a same type in a target server, and constructing second level physical topologies of ring structures having a quantity consistent with a quantity of acceleration devices of the same type among different types of acceleration devices in the target server, wherein different types of acceleration devices supporting cache coherence protocol are provided in the target server, quantities of acceleration devices of different types are the same and at least two, and acceleration devices in the second level physical topologies are connected through the cache coherence protocol;
performing a first processing on to be synchronized data related to model training in acceleration devices of the same type through scatter reduce communication based on the first level physical topologies, and performing a second processing on data after the first processing in different types of acceleration devices through scatter reduce communication based on the second level physical topologies; and
performing a third processing on data after the second processing in different types of acceleration devices through all gather communication based on the second level physical topologies, and performing a fourth processing on data after the third processing in acceleration devices of the same type through all gather communication based on the first level physical topologies;
wherein a physical connection between different types of acceleration devices comprises a connection using central processing unit (CPU) as an intermediate medium and a connection using the cache coherence protocol; and
before the constructing the first level physical topologies of ring structures having the quantity consistent with the quantity of types of acceleration devices among acceleration devices of the same type in the target server, and constructing the second level physical topologies of ring structures having the quantity consistent with the quantity of acceleration devices of the same type among different types of acceleration devices in the target server, the method further comprises:
determining whether a bandwidth of data transmission between different types of acceleration devices through the connection using CPU as the intermediate medium is higher than a bandwidth of data transmission between different types of acceleration devices through the connection using the cache coherence protocol; and
using the connection using the cache coherence protocol as the physical connection between different types of acceleration devices when the bandwidth of data transmission between different types of acceleration devices through the connection using CPU as the intermediate medium is higher than the bandwidth of data transmission between different types of acceleration devices through the connection using the cache coherence protocol.