US 12,450,484 B2
Communication optimizations for distributed machine learning
Srinivas Sridharan, Bangalore (IN); Karthikeyan Vaidyanathan, Bangalore (IN); Dipankar Das, Pune (IN); Chandrasekaran Sakthivel, Sunnyvale, CA (US); and Mikhail E. Smorkalov, Nizhniy Novgorod (RU)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on May 19, 2023, as Appl. No. 18/320,385.
Application 18/320,385 is a continuation of application No. 17/685,462, filed on Mar. 3, 2022, granted, now 11,704,565.
Application 17/685,462 is a continuation of application No. 15/859,180, filed on Dec. 29, 2017, granted, now 11,270,201, issued on Mar. 8, 2022.
Prior Publication US 2023/0376762 A1, Nov. 23, 2023
Int. Cl. G06N 3/08 (2023.01); G06F 9/50 (2006.01); G06N 3/04 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01); G06N 3/063 (2023.01); G06N 3/084 (2023.01); G06N 3/088 (2023.01); G06N 7/01 (2023.01)
CPC G06N 3/08 (2013.01) [G06F 9/50 (2013.01); G06F 9/5061 (2013.01); G06F 9/5077 (2013.01); G06N 3/04 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01); G06N 3/088 (2013.01); G06N 3/048 (2023.01); G06N 7/01 (2023.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus comprising:
an interconnect switch configured to couple with a plurality of graphics processors via a plurality of point-to-point interconnects; and
one or more processors including a graphics processor, the graphics processor coupled with the interconnect switch via a point-to-point interconnect of the plurality of point-to-point interconnects, the graphics processor comprising:
a cluster of graphics multiprocessors configured for single instruction multiple thread (SIMT) operation, the cluster of graphics multiprocessors interconnected via a data interconnect and configured to exchange data via the data interconnect, the cluster of graphics multiprocessors including a graphics multiprocessor configured to:
receive data associated with a first thread group to be executed via the graphics multiprocessor during execution of operations associated with a second thread group, the data to be received via a point-to-point interconnect coupled with the interconnect switch in association with a communication pattern for messages to be transmitted between worker nodes of a first group of worker nodes configured to perform distributed training of a neural network; and
transmit data processed by the second thread group via the point-to-point interconnect coupled with the interconnect switch during execution of operations the first thread group.