US 12,346,274 B2
System and method for implementing a network-interface-based allreduce operation
Keith D. Underwood, Fort Collins, CO (US); Robert L. Alverson, Seattle, WA (US); Duncan Roweth, Bristol (GB); and Nathan L. Wichmann, Bloomington, MN (US)
Assigned to Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed by Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed on Jul. 17, 2023, as Appl. No. 18/353,277.
Application 18/353,277 is a continuation of application No. 17/383,606, filed on Jul. 23, 2021, granted, now 11,714,765.
Prior Publication US 2023/0359574 A1, Nov. 9, 2023
Int. Cl. G06F 13/20 (2006.01); G06F 12/10 (2016.01)
CPC G06F 13/20 (2013.01) [G06F 12/10 (2013.01); G06F 2212/1024 (2013.01)] 17 Claims
OG exemplary drawing
 
14. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method, the method comprising:
performing a collective operation, wherein performing the collective operation comprises:
receiving packets containing data of a first segment of an input buffer from a plurality of peer compute nodes;
implementing a three-stage hierarchical allreduce operation,
wherein a third stage of the three-stage hierarchical allreduce operation comprises a local data fan-out based on at least one of:
a local multicast; or
a pipelined ring broadcast;
obtaining an aggregate result by processing the received data packets; and
sending the aggregate result to the plurality of peer compute nodes;
setting up a first network operation;
asynchronously triggering the first network operation in response to completion of a second network operation, wherein the first and second network operations are associated with the collective operation; and
adding flow control based on asynchronously triggered network operations.