| CPC G06F 13/20 (2013.01) [G06F 12/10 (2013.01); G06F 2212/1024 (2013.01)] | 17 Claims |

|
14. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method, the method comprising:
performing a collective operation, wherein performing the collective operation comprises:
receiving packets containing data of a first segment of an input buffer from a plurality of peer compute nodes;
implementing a three-stage hierarchical allreduce operation,
wherein a third stage of the three-stage hierarchical allreduce operation comprises a local data fan-out based on at least one of:
a local multicast; or
a pipelined ring broadcast;
obtaining an aggregate result by processing the received data packets; and
sending the aggregate result to the plurality of peer compute nodes;
setting up a first network operation;
asynchronously triggering the first network operation in response to completion of a second network operation, wherein the first and second network operations are associated with the collective operation; and
adding flow control based on asynchronously triggered network operations.
|