US 11,714,765 B2
System and method for implementing a network-interface-based allreduce operation
Keith D. Underwood, Powell, TN (US); Robert L. Alverson, Seattle, WA (US); Duncan Roweth, Bristol (GB); and Nathan L. Wichmann, Plymouth, MN (US)
Assigned to Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed by Hewlett Packard Enterprise Development LP, Houston, TX (US)
Filed on Jul. 23, 2021, as Appl. No. 17/383,606.
Prior Publication US 2023/0035657 A1, Feb. 2, 2023
Int. Cl. G06F 13/20 (2006.01); G06F 12/10 (2016.01)
CPC G06F 13/20 (2013.01) [G06F 12/10 (2013.01); G06F 2212/1024 (2013.01)] 18 Claims
OG exemplary drawing
 
1. An apparatus, comprising:
a network interface to transmit and receive data packets over a network;
a memory comprising one or more buffers;
an address translation logic for enabling a region in the memory to be marked as being cacheable;
an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and
a circuitry to:
receive, via the network interface, data packets from the network;
aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at network rate;
transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation.