[Twisted-Python] intermittent problem: not accepting new connections

Alec Matusis matusis at yahoo.com
Wed Sep 10 17:01:27 EDT 2008


> Do non-error log messages continue to appear in the Twisted log?  ie,
> is
> it clear that the logging system is still working, or could it have
> failed
> in some way, obscuring any exception reports?

Yes, I print the results of garbage collection every 10 min like this:

print "number of objects tracked by the garbage collector is:", len(gc.get_objects())

and they appear in the main twisted log.

2008/09/10 13:21 -0700 [-] number of objects tracked by the garbage collector is: 860021
2008/09/10 13:31 -0700 [-] number of objects tracked by the garbage collector is: 864316

> Any new unhandled errno values should definitely result in an exception
> being logged (notice that the `raise´ which follows the checks for
> various errno values is inside a try/except which logs any exception).

I noticed that raise too... Could it then be EWOULDBLOCK, EAGAIN or EPERM?

                except socket.error, e:
                    if e.args[0] in (EWOULDBLOCK, EAGAIN):
                        self.numberAccepts = i
                        break
                    elif e.args[0] == EPERM:
                        # Netfilter on Linux may have rejected the
                        # connection, but we get told to try to accept()
                        # anyway.
                        continue


I am not sure how to debug this problem- I have another twisted server of a different type on that machine, and while the problematic server stops accepting connections, the second one works just fine, so this is not a machine-wide issue.
What could it be?

> -----Original Message-----
> From: twisted-python-bounces at twistedmatrix.com [mailto:twisted-python-
> bounces at twistedmatrix.com] On Behalf Of Jean-Paul Calderone
> Sent: Wednesday, September 10, 2008 1:44 PM
> To: Twisted general discussion
> Subject: Re: [Twisted-Python] intermittent problem: not accepting new
> connections
> 
> On Wed, 10 Sep 2008 13:32:06 -0700, Alec Matusis <matusis at yahoo.com>
> wrote:
> >I have had a twisted epoll server that was heavily used, such that it
> >saturated CPU (100% shown by "top", about 5000 connections, intense
> message
> >relaying).
> >I am using twisted 2.5.0 that I patched for epoll bug.
> >It was run on python 2.4.4 , 2.6.11 kernel on a single core xeon 3.0
> GHz
> >CPU. This server has been on for many months, and it has been rock-
> stable.
> >
> >A couple of days ago I migrated that server to a newer machine: same
> patched
> >twisted 2.5.0, same python 2.4.4, newer 2.6.24 kernel and a quad core
> xeon
> >L5420 CPU.
> >CPU usage dropped from 100% to 30%, as expected, with the same rate of
> >client connections.
> >
> >However the server now has the following intermittent problem: about
> twice a
> >day, it stops accepting new connections for a short period of 5-10
> minutes.
> >
> >telnet times out, I get this:
> >root at serv2:/proc/net/netfilter# telnet localhost 5229
> >
> >Trying 127.0.0.1...
> >
> >Existing connections are not cut, they server receives/delivers
> messages
> >to/from them just fine.
> >These short periods of not accepting connections do not correlate with
> >increased CPU load or with the overall number of connections to the
> server.
> >
> >I have had a problem with the same symptoms before, when a server
> process
> >run out of its quota of file descriptors.
> >However, there were clear messages in the twisted log at that time,
> and
> >upping the ulimits solved the problem.
> >This time, there are no errors in ANY logs (twisted log.
> /var/log/messages,
> >etc)
> 
> Do non-error log messages continue to appear in the Twisted log?  ie,
> is
> it clear that the logging system is still working, or could it have
> failed
> in some way, obscuring any exception reports?
> 
> >
> >I am out of ideas on what this could be, because my setup is exactly
> the
> >same as I have been using in the last year, except for a faster CPU
> and a
> >newer kernel?
> >
> >I suspect that there are some new uncaught accept() exceptions in
> >internet/tcp.py in the part where it's looking for EMFILE, ENOBUFS,
> ENFILE,
> >ENOMEM, ECONNABORTED errors.
> >
> 
> Any new unhandled errno values should definitely result in an exception
> being logged (notice that the `raise´ which follows the checks for
> various errno values is inside a try/except which logs any exception).
> 
> Jean-Paul
> 
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python





More information about the Twisted-Python mailing list