CPC G06F 13/20 (2013.01) [G06F 12/10 (2013.01); G06F 2212/1024 (2013.01)] | 18 Claims |
1. An apparatus, comprising:
a network interface to transmit and receive data packets over a network;
a memory comprising one or more buffers;
an address translation logic for enabling a region in the memory to be marked as being cacheable;
an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and
a circuitry to:
receive, via the network interface, data packets from the network;
aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at network rate;
transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation.
|