CPC H04L 41/0677 (2013.01) [H04L 61/5007 (2022.05); H04L 67/145 (2013.01); H04L 69/16 (2013.01)] | 17 Claims |
1. A method comprising:
detecting, at a first node, a failure of a link between the first node and a second node using a link level keep-alive protocol, wherein the first node and the second node are within a multi-node system;
determining, at the first node and in response to detecting the failure using the link level keep-alive protocol, one or more transmission control protocol (TCP) sockets of a plurality of TCP sockets on the first node that are communicating over the link between the first node and the second node;
prior to a read or write action between the first node and the second node being initiated in user space of the first node over a particular TCP socket of the determined one or more TCP sockets, and in response to detecting the failure using the link level keep-alive protocol, writing information accessible to a TCP stack in kernel space of the first node for the determined one or more TCP sockets, the information indicating that the determined one or more TCP sockets have an error;
reading, at the first node, the information accessible to the TCP stack in response to the read or write action; and
remediating, at the first node, the particular TCP socket of the determined one or more TCP sockets in response to reading the information.
|