[Twisted-Python] Many connections and TIME_WAIT
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Wed Jan 27 06:05:16 MST 2010
On 04:50 am, donal.mcmullan at gmail.com wrote:
>I've been prototyping a client that connects to thousands of servers
>and
>calls some method. It's not real important to me at this stage whether
>that's via xmlrpc, perspective broker, or something else.
>
>What seems to happen on the client machine is that each network
>connection
>that gets opened and then closed goes into a TIME_WAIT state, and
>eventually
>there are so many connections in that state that it's impossible to
>create
>any more.
Yep. That's what happens to a TCP connection when you close it.
>
>I'm keeping an eye on the output of
>netstat -an | wc -l
>Initially I've got 569 entries there. When I run my test client, that
>ramps
>up really quickly and peaks at about 2824. At that point, the client
>reports
>a callRemoteFailure:
Presumably these numbers have something to do with how quickly you're
opening and closing new connections. TIME_WAIT lasts for 2MSL (4
minutes) to ensure that a future connection doesn't receive data
intended for a previous connection (clearly a bad thing).
However... 2824 is a pretty low number at which to run out of sockets.
Perhaps you're running this software on Windows? I think Windows has a
ridiculously small number of "client sockets" allocated by default. I
seem to recall this being something you can change with a registry edit
or something like that.
Another option would be to switch to a POSIX-platform instead.
If you're *not* on Windows, then this is odd and perhaps bears further
scrutiny.
>
>callRemoteFailure [Failure instance: Traceback (failure with no
>frames):
><class 'twisted.internet.error.ConnectionLost'>: Connection to the
>other
>side was lost in a non-clean fashion: Connection lost.
This isn't exactly how I'd expect it to fail, but I also don't know what
"callRemoteFailure" is or where it comes from, so maybe that's not too
surprising.
>Increasing the file descriptor limits doesn't seem to have any effect
>on
>this.
Quite so. The process has, after all, already closed these sockets.
They no longer count towards the process's file descriptor limit (oh
dear, I suppose you're not using Windows if you have a file descriptor
limit to raise).
>
>Is there an established Twisted sanctioned canonical way to free up
>this
>resource? Or am I doing something wrong? I'm looking into tweaking
>SO_REUSEADDR and SO_LINGER - that sound sane?
>
>Just tapping the lazywebs to see if anyone's already seen this in the
>wild.
On most reasonably configured Linux machines, you shouldn't run into
this problem until you're doing at least an order of magnitude more
work. Many times, I have run clients that do many thousands of new
connections per second, resulting in tens of thousands of TIME_WAIT
sockets on the system with no problem. So, I'm not sure why you're
running into this after only a few thousand.
Jean-Paul
More information about the Twisted-Python
mailing list