[Twisted-Python] connectionLost never reached after calling loseConnection: stuck in CLOSE_WAIT forever
Stefano Debenedetti
ste at demaledetti.net
Sun Oct 17 13:00:27 EDT 2010
Hello Glyph, thanks for your reply and suggestions.
I don't have a self-contained sample yet but at least I managed to
reproduce it reliably on my installation and after a few more
experiments I think I am narrowing this down, please read below.
> On Oct 16, 2010, at 11:22 AM, Stefano Debenedetti wrote:
>
>> Does this sound familiar in any way? Any suggestions off top of head
>> while I try to come up with a self-contained sample which is
>> reliably reproducing the issue I'm seeing happening only "sometimes
>> and quite seldom[TM]"?
>
> I can't recall having seen this exact issue in the past, but as you've described it it sounds like you may have discovered a Twisted bug. I'm looking forward to your example.
>
> I do have a few questions:
>
> * What version of Twisted are you using?
> * Have you tried a more recent version? Trunk?
I'm using 10.1.0. I haven't tested on trunk because I see basically
no difference in internet/abstract.py and internet/tcp.py but if you
really think I should I will give trunk a try.
> * What reactor are you using?
> * Have you tried a different reactor?
Same behavior with select, poll and epoll.
> * What platform/OS are you on? What version?
> * Have you tried a different platform?
I am using debian lenny with a self-compiled 2.6.35.2 kernel.
/etc/debian_version says: 5.0.5
> I am also curious whether changing
>
> proto.transport.loseConnection()
>
> to
> reactor.callLater(0, proto.transport.loseConnection)
>
> makes any difference to your example.
I tried this and it didn't make any difference. Using a 1 second
delay didn't improve things either.
What did make a difference was to comment this line, the problem
never happens without it:
to.transport.registerProducer(_from.transport, True)
Next test I did was to try registering the producer as non-streaming:
to.transport.registerProducer(_from.transport, False)
This also fixes the problem but it causes an exception to be printed
in the log once per set of A, B and C connections:
Traceback (most recent call last):
File "/home/lala/lib/python/twisted/python/log.py", line 84, in
callWithLogger
return callWithContext({"system": lp}, func, *args, **kw)
File "/home/lala/lib/python/twisted/python/log.py", line 69, in
callWithContext
return context.call({ILogContext: newCtx}, func, *args, **kw)
File "/home/lala/lib/python/twisted/python/context.py", line 59,
in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/home/lala/lib/python/twisted/python/context.py", line 37,
in callWithContext
return func(*args,**kw)
--- <exception caught here> ---
File "/home/lala/lib/python/twisted/internet/pollreactor.py", line
184, in _doReadOrWrite
why = selectable.doWrite()
File "/home/lala/lib/python/twisted/internet/tcp.py", line 428, in
doWrite
result = abstract.FileDescriptor.doWrite(self)
File "/home/lala/lib/python/twisted/internet/abstract.py", line
145, in doWrite
self.producer.resumeProducing()
File "/home/lala/lib/python/twisted/internet/abstract.py", line
339, in resumeProducing
assert self.connected and not self.disconnecting
exceptions.AssertionError:
This lead me to change the following lines in the doWrite code in
internet/abstract.py:
if self.disconnecting:
# But if I was previously asked to let the connection die, do
# so.
return self._postLoseConnection()
elif self.producer is not None and ((not self.streamingProducer)
or self.producerPaused):
# tell them to supply some more.
self.producerPaused = 0
self.producer.resumeProducing()
#elif self.disconnecting:
# # But if I was previously asked to let the connection die, do
# # so.
# return self._postLoseConnection()
Basically this just inverts the order of checks: first check if
disconnecting, then check if a producer should be unpaused.
This makes the above traceback disappear and still fixes my
CLOSE_WAIT problem.
But using a non-streaming producer makes my app consume a lot more
memory so I reverted back my code to register the producer as streaming:
to.transport.registerProducer(_from.transport, True)
Now the CLOSE_WAIT issue is gone, no traceback appears in the log
and my app consumes the same memory as before. Victory?
I will still try to come up with a self-contained sample which
reproduces the CLOSE_WAIT problem but in the meanwhile I would like
to ask if the above-mentioned change to the doWrite definition in
internet/abstract.py is likely to destroy the universe in the near
future or if it actually sounds like a good idea.
> Thanks, and good luck,
>
> -glyph
Thanks a lot for your help!
ciao
ste
More information about the Twisted-Python
mailing list