US 11,750,441 B1
Propagating node failure errors to TCP sockets
Reji Thomas, Bangalore (IN); Harmeet Singh, Bangalore (IN); Amit Arora, Bangalore (IN); Jimmy Jose, Hosur (IN); Sairam Neelam, Hyderabad (IN); and Vinod Arumugham Chettiar, Bangalore (IN)
Assigned to Juniper Networks, Inc., Sunnyvale, CA (US)
Filed by Juniper Networks, Inc., Sunnyvale, CA (US)
Filed on Sep. 7, 2018, as Appl. No. 16/125,369.
Int. Cl. H04L 41/0677 (2022.01); H04L 67/145 (2022.01); H04L 69/16 (2022.01); H04L 61/5007 (2022.01)
CPC H04L 41/0677 (2013.01) [H04L 61/5007 (2022.05); H04L 67/145 (2013.01); H04L 69/16 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method comprising:
detecting, at a first node, a failure of a link between the first node and a second node using a link level keep-alive protocol, wherein the first node and the second node are within a multi-node system;
determining, at the first node and in response to detecting the failure using the link level keep-alive protocol, one or more transmission control protocol (TCP) sockets of a plurality of TCP sockets on the first node that are communicating over the link between the first node and the second node;
prior to a read or write action between the first node and the second node being initiated in user space of the first node over a particular TCP socket of the determined one or more TCP sockets, and in response to detecting the failure using the link level keep-alive protocol, writing information accessible to a TCP stack in kernel space of the first node for the determined one or more TCP sockets, the information indicating that the determined one or more TCP sockets have an error;
reading, at the first node, the information accessible to the TCP stack in response to the read or write action; and
remediating, at the first node, the particular TCP socket of the determined one or more TCP sockets in response to reading the information.