[Twisted-Python] intermittent problem: not accepting new connections

Jean-Paul Calderone exarkun at divmod.com
Thu Sep 11 09:46:51 EDT 2008


On Thu, 11 Sep 2008 14:31:38 +0100, "Paul C. Nendick" <paul.nendick at gmail.com> wrote:
>Not necessarily related to what you've described, but I'll share
>something that's helped a good deal on my most-heavily hit twisted
>servers. Presuming you're using Linux:
>
> echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
>
>
>from http://lartc.org/howto/lartc.kernel.obscure.html :
>
>"Enable fast recycling TIME-WAIT sockets. Default value is 1. It
>should not be changed without advice/request of technical experts"
>
>My expert advice: only use this on machines connected on a low-latency
>LAN. It *will* break internet-facing interfaces. It halves the
>constant used by the Nagle algorithm:
>
>http://en.wikipedia.org/wiki/Nagle's_algorithm
>

This is somewhat interesting.  It suggests a potential problem which
I hadn't thought about before.  If you need to accept more than about
64k connections (not necessarily concurrent) in less than TIME-WAIT
seconds, you might run out of ports.  Anyone know what happens to new
connection attempts to a server in this condition?

Alec, any idea if your server could be getting into this state every
once in a while?  This is an appealing hypothesis, since it wouldn't
necessarily happen at peak connection time (but potentially shortly
after a peak), would resolve itself given a short period of time,
wouldn't necessarily prevent all new connection attempts, since old
TIME-WAIT sockets would be gradually timing out (so your other low-
volume servers might still appear to be working normally), wouldn't
interfere with already established connections, and might not change
the userspace-visible syscall behavior (depending on what Linux does
in this case, but I wouldn't be surprised if connection failures due
to this never showed up in an accept(2) result).

Jean-Paul




More information about the Twisted-Python mailing list