US 12,335,142 B2
Network interface for data transport in heterogeneous computing environments
Pratik M. Marolia, Hillsboro, OR (US); Rajesh M. Sankaran, Portland, OR (US); Ashok Raj, Portland, OR (US); Nrupal Jani, Hillsboro, OR (US); Parthasarathy Sarangam, Portland, OR (US); and Robert O. Sharp, Austin, TX (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jan. 19, 2024, as Appl. No. 18/417,570.
Application 17/129,756 is a division of application No. 16/435,328, filed on Jun. 7, 2019, granted, now 11,025,544, issued on Jun. 1, 2021.
Application 18/417,570 is a continuation of application No. 17/129,756, filed on Dec. 21, 2020, granted, now 11,929,927.
Prior Publication US 2024/0314072 A1, Sep. 19, 2024
Int. Cl. H04L 45/60 (2022.01); G06F 12/1081 (2016.01); G06F 13/28 (2006.01); H04L 45/74 (2022.01); H04L 49/90 (2022.01)
CPC H04L 45/742 (2013.01) [G06F 12/1081 (2013.01); G06F 13/28 (2013.01); H04L 45/60 (2013.01); H04L 49/9068 (2013.01)] 33 Claims
OG exemplary drawing
 
1. Network interface controller circuitry configurable for use in a host node and in association with a device driver, the host node comprising at least one graphics processing unit (GPU)-accessible memory, at least one host memory, and at least one host fabric, the host node to be communicatively coupled via at least one multi-switch fabric to a remote system, the remote system comprising at least one other GPU-accessible memory, at least one other host memory, and at least one other host fabric, the network interface controller circuitry comprising:
network interface circuitry for use in Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) packet data communication with the remote system via the at least one multi-switch fabric, the ROCE packet data communication to indicate at least one RDMA write to the host node from the remote system and/or at least one RDMA read from the host node to the remote system, the ROCE packet data communication to be initiated in response, at least in part, to at least one host application request; and
programmable circuitry to perform operations comprising:
in event that the ROCE packet data communication indicates the at least one RDMA write, directly writing, via the at least one host fabric, received packet data to the at least one GPU-accessible memory;
in event that the ROCE packet data communication indicates the at least one RDMA read, directly reading, via the at least one host fabric, other data from the at least one GPU-accessible memory that is to be provided to the remote system via the ROCE packet data communication; and
encryption, decryption, and compression-related host central processing unit (CPU) offload operations;
wherein:
the writing and the reading are to be performed in a manner that bypasses both (1) host CPU and/or host operating system (OS) in the writing and the reading, and (2) copying of the received packet data and the other data to the at least one host memory of the host node;
the writing and/or the reading are configurable to comprise use of direct data placement (DDP);
the writing and/or the reading are configurable to comprise use of address translation;
the address translation is to be implemented, at least in part, using the device driver;
portions of the received packet data and/or the other data are to be routed to their destinations via respective fabric-associated routings;
the respective fabric-associated routings are configurable to be mutually different from each other, at least in part; and
the at least one multi-switch fabric is to communicatively couple multiple switches associated with the host node and the remote system.