[Twisted-Python] connectionLost never reached after calling loseConnection: stuck in CLOSE_WAIT forever

Sat Oct 16 09:34:52 MDT 2010

Apologies for the double posting, I am resending this because the
original message appears to have triggered some mbox format handling
bug somewhere and it appears truncated at the first content line
starting with "From" here:

http://twistedmatrix.com/pipermail/twisted-python/2010-October/023029.html

sorry about this
ste

-------- Messaggio originale --------
Oggetto: connectionLost never reached after calling loseConnection:
stuck in CLOSE_WAIT forever
Data: Sat, 16 Oct 2010 17:22:56 +0200
Mittente: Stefano Debenedetti <ste at demaledetti.net>
A: twisted-python at twistedmatrix.com

Hi,

I have a similar problem as described in this old mail [1] except
that I am not using threads at all.

Basically my app is a server that accepts connections from a client
(connection A), when A is established it opens connections B and C
to another server and forwards data from A to B and from C back to A.

Here is how data is forwarded:

def forward(_from, to):
    _from.dataReceived = to.transport.write
    to.transport.registerProducer(_from.transport, True)

def loseConnection(proto, onlost=lambda *args: None):
    proto.connectionLost = onlost
    proto.transport.unregisterProducer()
    proto.transport.loseConnection()

as soon as connectionMade has been called on connection A, B and C:

forward(A_protocol, B_protocol)
forward(C_protocol, A_protocol)

as soon as it decides its job is done and wants to tear down all
connections:

loseConnection(B_protocol)
loseConnection(C_protocol, lambda *a: loseConnection(A_protocol))

The debug logs in my code assure me that the loseConnection
function is always called with the A_protocol as argument, yet
sometimes and totally unpredictably as far as I can tell
connectionLost is not called on some A_protocol instances and
corresponding A connections get stuck in CLOSE_WAIT state.

Another hint is that netstat shows 1 byte in the Recv-Q for those
connections (and 0 in the Send-Q).

In another mail [2] in that old thread I got this snippet

"""
Your issue with CLOSE_WAIT sockets could be due to registering
producers which have no further data to produce.
"""

This sounds like it could be my case, even if as you can see I am
always unregistering producers from transports before I call
loseConnection on them.

Does this sound familiar in any way? Any suggestions off top of head
while I try to come up with a self-contained sample which is
reliably reproducing the issue I'm seeing happening only "sometimes
and quite seldom[TM]"?

Thanks, ciao
ste

[1]
http://twistedmatrix.com/pipermail/twisted-python/2008-June/017853.html
[2]
http://twistedmatrix.com/pipermail/twisted-python/2008-June/017855.html