2

I have a server program that uses TCP.

Sometimes, I need to restart the program for updates, or any other reasons, but when I do, the program closes the server port, and then once it restarts, it tries to create a new listener on that port, but it only seems to be able to do that successfully about a minute (~63 seconds) after the port was previously closed. Why would this be, and is there any way to fix it?

The program is running on RamNode's Ubuntu 18.04.

Are there perhaps any settings I can change in the OS, or is it perhaps a RamNode thing, etc.?

Spiff
  • 110,156
NS studios
  • 115
  • 2
  • 9

1 Answers1

0

For applications dealing with TCP sockets, it's most likely due to the TIME-WAIT state of the TCP protocol:

TIME-WAIT - represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.

When a previous connection is closed one of the two sides (it depends on which initiated the close) will keep the TCP socket in TIME-WAIT state to prevent an identical TCP connection to exist for some time. When this is on the client side it's not a problem, because the port is a random ephemeral port. When it is on a server side with a fixed port and the server is restarted, this prevents the server to bind again on the same port.


Here is an example using socat to reproduce the behavior.

term1:

socat tcp4-listen:5555 -

term2:

socat tcp4:127.0.0.1:5555 -

Now the connection is established, just interrupt term1's socat command. You get the 1mn delay if restarting it right away:

term1:

$ socat tcp4-listen:5555 -
2021/07/03 21:32:09 socat[320904] E bind(5, {AF=2 0.0.0.0:5555}, 16): Address already in use

The feature to prevent this behavior, especially useful for servers, is the socket option SO_REUSEADDR:

SO_REUSEADDR

Indicates that the rules used in validating addresses supplied in a bind(2) call should allow reuse of local addresses. For AF_INET sockets this means that a socket may bind, except when there is an active listening socket bound to the address. When the listening socket is bound to INADDR_ANY with a specific port then it is not possible to bind to this port for any local address. Argument is an integer boolean flag.

Instead, when first starting the listening socat with:

socat tcp4-listen:5555,reuseaddr -

even if interrupted after a connection like before, it can now be started again right away without evaluating a TCP connection (this includes a former TCP connection in TIME-WAIT state) using this port if it was also using SO_REUSEADDR on the previous run, for bind() to succeed.

Here's a weird example (showing it's not limited to TIME-WAIT states):

term1:

socat tcp4:127.0.0.1:5555,bind=127.0.0.1:5555,reuseaddr -

(Command above connected to itself in a simultaneous TCP initiation, as allowed by RFC 793)

$ ss -tn sport == 5555
State  Recv-Q  Send-Q   Local Address:Port    Peer Address:Port  
ESTAB  0       0            127.0.0.1:5555       127.0.0.1:5555  

It won't prevent a listening socket to be created after on port 5555

term2:

socat tcp4-listen:5555,reuseaddr 

term3:

$ ss -atnp sport == 5555
State  Recv-Q Send-Q Local Address:Port   Peer Address:Port                                    
LISTEN 0      5            0.0.0.0:5555        0.0.0.0:*     users:(("socat",pid=321093,fd=5)) 
ESTAB  0      0          127.0.0.1:5555      127.0.0.1:5555  users:(("socat",pid=321047,fd=5))

$ socat -d -d tcp4:127.0.0.1:5555 - 2021/07/03 22:16:27 socat[322125] N opening connection to AF=2 127.0.0.1:5555 2021/07/03 22:16:27 socat[322125] N successfully connected from local address AF=2 127.0.0.1:52218 2021/07/03 22:16:27 socat[322125] N reading from and writing to stdio 2021/07/03 22:16:27 socat[322125] N starting data transfer loop with FDs [5,5] and [0,1]


Back to the problem: it's on your application.

Either:

  • correct the application to use setsockopt() on sockets that will be listening, with SO_REUSEADDR to allow bind() to succeed despite a connection using that port also existing (TIME-WAIT counts as such connection and is the probably only cause in real cases).

or if you can't, on a dynamically linked (with libc) application:

A.B
  • 6,306