US 12,321,619 B2
Aggregating small remote memory access requests
Duncan Roweth, Bristol (GB); Robert L. Alverson, Seattle, WA (US); Nathan L. Wichmann, Plymouth, MN (US); and Eric P. Lundberg, Eau Claire, WI (US)
Assigned to Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed by Hewlett Packard Enterprise Development LP, Houston, TX (US)
Filed on Oct. 28, 2022, as Appl. No. 17/976,728.
Prior Publication US 2024/0143198 A1, May 2, 2024
Int. Cl. G06F 3/06 (2006.01)
CPC G06F 3/0625 (2013.01) [G06F 3/0659 (2013.01); G06F 3/067 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, by a local network interface card (NIC), a stream of commands, wherein a respective command comprises a first plurality of memory-operation requests, wherein each request is associated with a remote destination NIC and a remote destination core;
buffering asynchronously the requests into a first plurality of queues based on the destination NIC associated with each request, wherein each queue of the first plurality of queues is specific to a corresponding remote destination NIC, and wherein a number of the first plurality of queues is based on a number of remote destination NICs with which the local NIC is capable of communicating;
responsive to determining that a total size of the requests stored in a first queue, specific to a first corresponding remote destination NIC, of the first plurality of queues reaches a predetermined threshold, aggregating the requests stored in the first queue into a first packet and sending the first packet to the remote destination NIC over a high-bandwidth network;
receiving, by the local NIC, a plurality of packets, wherein a second packet of the received packets comprises a second plurality of memory-operation requests, wherein each request is destined to the local NIC and associated with a local destination core;
buffering asynchronously the requests of the second packet into a second plurality of queues based on the local destination core associated with each request, wherein each queue of the second plurality of queues is specific to a corresponding local destination core, and wherein a number of the second plurality of queues is based on a number of local destination cores; and
responsive to determining that a total size of the requests stored in a second queue, specific to a first corresponding local destination core, of the second plurality of queues reaches the predetermined threshold, aggregating the requests stored in the second queue into a third packet and sending the third packet to the local destination core.