[Twisted-Python] connectionLost never reached after calling loseConnection: stuck in CLOSE_WAIT forever

Stefano Debenedetti ste at demaledetti.net
Thu Oct 28 12:52:16 EDT 2010


Il 18/10/2010 17:21, Stefano Debenedetti ha scritto:
> Anyway, I ran Twisted tests on my installation after the patch I
> mentioned in my previous mail and I got the same results as before
> applying it so at least it seems it doesn't break any obvious stuff.


Sorry for replying to myself but for the record: the patch I sent
does break stuff (connections are sometimes closed before all data
has been sent) so don't use it.

The partially good news is that I managed to write a self-contained
and quite short example that can reproduce the exact same problem
I'm witnessing in my app. The bad news is that it does so only about
50% of the times but I thought I would share it while I keep on
trying to make it more reliable.

Please find attached one .sh file and one .py file, save somewhere
and make them executable. You will also need netcat (nc).

If you run the .sh file and after three seconds type in a short line
of text followed by the enter key, you should see the same line you
typed printed back many times on your terminal and quite a lot of
network activity going through the localhost interface for about a
minute. Don't redirect the .sh output to /dev/null, the problem
seems to occur only when the terminal application you run it in gets
to 100% CPU while it's printing data received by netcat. Hopefully
you have a multicore machine and this won't disrupt your desktop.

If you're lucky and nothing bad happens, after a while the .sh
script will terminate and all connections opened by it and the .py
file will be closed. Please remember to kill the three python
processes launched by the script before trying again.

If you're unlucky like I am, after a while all connections will be
closed except the one between netcat and one of the three servers
powered by the .py file.

That connection will be in this state according to netstat:

# netstat -np --inet 2> /dev/null | grep 127.0.0.1
tcp        0      0 127.0.0.1:8080          127.0.0.1:36815
ESTABLISHED 10042/python2.6
tcp        0      0 127.0.0.1:36815         127.0.0.1:8080
ESTABLISHED 10051/nc

If you then CTRL-C the .sh script so that netcat gets terminated,
you will get to the dreaded CLOSE_WAIT forever state:

# netstat -np --inet 2> /dev/null | grep 127.0.0.1
tcp        1      0 127.0.0.1:8080          127.0.0.1:36815
CLOSE_WAIT  10042/python2.6


Please note that even though the .py file is called three times and
launches a different server application each time, the only one I'm
interested in is the first one ("one"), the other two are just there
to simulate the third-party apps that my server is dealing with.
This is why servers "two" and "three" do seemingly silly stuff
including closing some of their connections at some point.

My goal is that no matter how and when the client and the "two" and
"three" servers close their connections to "one", the client
connection to "one" is always properly terminated and does never get
stuck in CLOSE_WAIT state.

Thanks for any feedback you might have,

ciao
ste



-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_producer.sh
Type: application/x-sh
Size: 260 bytes
Desc: not available
Url : http://twistedmatrix.com/pipermail/twisted-python/attachments/20101028/31bdef34/attachment.sh 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_producer.py
Type: text/x-python
Size: 4750 bytes
Desc: not available
Url : http://twistedmatrix.com/pipermail/twisted-python/attachments/20101028/31bdef34/attachment.py 


More information about the Twisted-Python mailing list