[Twisted-Python] intermittent problem: not accepting new connections

Alec Matusis matusis at yahoo.com
Wed Sep 10 20:28:07 EDT 2008


> You could try running strace against the server while it's not
> accepting
> connections.  


It went into this regime again tonight, and I captured strace.

A few remarks first:

The outage was brief, it lasted about 3 min. It did not correspond to the peak number of connections.
The application has several protocols (like manhole and various control ports), that are listening on different ports.
When the server stopped accepting connections on the main port (where external internet clients connect), telnet connection to manhole and control ports are still possible.
Also, I think when it is in this bad state, it does accept some external connections, maybe 50%.

Anyway, here's the strace in the bad state:

accept(9, {sa_family=AF_INET, sin_port=htons(1474), sin_addr=inet_addr("83.153.87.198")}, [13980580626831507472]) = 491
accept(13, {sa_family=AF_INET, sin_port=htons(40953), sin_addr=inet_addr("10.18.2.10")}, [17987095445330984976]) = 491
--- SIGCHLD (Child exited) @ 0 (0) ---
accept(9, {sa_family=AF_INET, sin_port=htons(50679), sin_addr=inet_addr("24.166.18.207")}, [17853676306370134032]) = 491
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(52873), sin_addr=inet_addr("67.87.61.193")}, [9929874236988456976]) = 493
accept(9, {sa_family=AF_INET, sin_port=htons(54305), sin_addr=inet_addr("200.114.45.212")}, [2437573306904215568]) = 491
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(2246), sin_addr=inet_addr("76.11.66.253")}, [14269655427913351184]) = 491
accept(9, {sa_family=AF_INET, sin_port=htons(32824), sin_addr=inet_addr("75.62.105.81")}, [4071254071732862992]) = 491
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(2262), sin_addr=inet_addr("76.11.66.253")}, [15422576932520198160]) = 491
accept(9, {sa_family=AF_INET, sin_port=htons(3316), sin_addr=inet_addr("24.127.249.90")}, [17585430653564878864]) = 491
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(4466), sin_addr=inet_addr("72.241.229.233")}, [8219350803517800464]) = 491
accept(9, {sa_family=AF_INET, sin_port=htons(1651), sin_addr=inet_addr("67.82.141.97")}, [8288312172811911184]) = 491
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(2669), sin_addr=inet_addr("24.94.231.135")}, [7857092508491186192]) = 398
accept(9, {sa_family=AF_INET, sin_port=htons(50252), sin_addr=inet_addr("97.119.52.164")}, [5531546250907746320]) = 398
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(2675), sin_addr=inet_addr("70.4.222.71")}, [8289438072718753808]) = 398
accept(9, {sa_family=AF_INET, sin_port=htons(63671), sin_addr=inet_addr("69.22.255.174")}, [13256345511754989584]) = 491
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(54520), sin_addr=inet_addr("92.130.148.171")}, [17929956025058721808]) = 491
accept(13, {sa_family=AF_INET, sin_port=htons(41056), sin_addr=inet_addr("10.18.2.10")}, [6962565032504721424]) = 491
accept(13, 0x7fffc3d5ca30, [16])        = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(49296), sin_addr=inet_addr("66.214.238.42")}, [10430336745580003344]) = 161
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
--- SIGCHLD (Child exited) @ 0 (0) ---
accept(13, {sa_family=AF_INET, sin_port=htons(41199), sin_addr=inet_addr("10.18.2.10")}, [17266800979928416272]) = 161
accept(9, {sa_family=AF_INET, sin_port=htons(49376), sin_addr=inet_addr("68.34.251.86")}, [16194944268614238224]) = 161
--- SIGCHLD (Child exited) @ 0 (0) ---
accept(13, {sa_family=AF_INET, sin_port=htons(41497), sin_addr=inet_addr("10.18.2.10")}, [1847038805765259280]) = 161
accept(13, 0x7fffc3d5ca30, [16])        = -1 EAGAIN (Resource temporarily unavailable)
--- SIGCHLD (Child exited) @ 0 (0) ---
accept(9, {sa_family=AF_INET, sin_port=htons(50013), sin_addr=inet_addr("71.241.76.86")}, [6756243874575810576]) = 161
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
--- SIGCHLD (Child exited) @ 0 (0) ---
--- SIGCHLD (Child exited) @ 0 (0) ---
accept(9, {sa_family=AF_INET, sin_port=htons(2594), sin_addr=inet_addr("24.27.123.223")}, [2452772955646590992]) = 161
accept(9, {sa_family=AF_INET, sin_port=htons(1690), sin_addr=inet_addr("67.242.75.46")}, [11098558340291100688]) = 276
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(13, {sa_family=AF_INET, sin_port=htons(41825), sin_addr=inet_addr("10.18.2.10")}, [7035467051472781328]) = 276
accept(13, {sa_family=AF_INET, sin_port=htons(41910), sin_addr=inet_addr("10.18.2.10")}, [13160362544696655888]) = 276



here's the strace in the normal state:


accept(9, {sa_family=AF_INET, sin_port=htons(40442), sin_addr=inet_addr("74.140.139.210")}, [18058590089415491600]) = 2218
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(58477), sin_addr=inet_addr("99.194.115.132")}, [7918454053414109200]) = 431
accept(13, {sa_family=AF_INET, sin_port=htons(50878), sin_addr=inet_addr("10.18.2.10")}, [13746674921184952336]) = 356
accept(9, {sa_family=AF_INET, sin_port=htons(1154), sin_addr=inet_addr("190.37.68.45")}, [9368613133427408912]) = 356
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
--- SIGCHLD (Child exited) @ 0 (0) ---
accept(9, {sa_family=AF_INET, sin_port=htons(1212), sin_addr=inet_addr("125.60.240.211")}, [13547953587627229200]) = 356
accept(9, {sa_family=AF_INET, sin_port=htons(1158), sin_addr=inet_addr("190.37.68.45")}, [9656843509579120656]) = 2223
accept(9, {sa_family=AF_INET, sin_port=htons(52008), sin_addr=inet_addr("24.189.205.204")}, [2939443190379315216]) = 2224
accept(9, {sa_family=AF_INET, sin_port=htons(50808), sin_addr=inet_addr("72.155.68.155")}, [8702643338529996816]) = 2253
accept(9, {sa_family=AF_INET, sin_port=htons(2545), sin_addr=inet_addr("24.10.61.223")}, [17368413446520963088]) = 2263
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(2547), sin_addr=inet_addr("24.10.61.223")}, [17512528634596818960]) = 356
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(52012), sin_addr=inet_addr("24.189.205.204")}, [3227673566531026960]) = 180
accept(9, {sa_family=AF_INET, sin_port=htons(50809), sin_addr=inet_addr("72.155.68.155")}, [8774700932567924752]) = 2224
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(1223), sin_addr=inet_addr("125.60.240.211")}, [14340587122044436496]) = 2253
accept(9, {sa_family=AF_INET, sin_port=htons(2462), sin_addr=inet_addr("206.74.170.35")}, [11387633141372944400]) = 2263
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(2464), sin_addr=inet_addr("206.74.170.35")}, [11531748329448800272]) = 2263
accept(9, {sa_family=AF_INET, sin_port=htons(1645), sin_addr=inet_addr("61.145.215.253")}, [7855966608584343568]) = 2224
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(4172), sin_addr=inet_addr("68.188.149.222")}, [5480880755099828240]) = 1283
accept(9, {sa_family=AF_INET, sin_port=htons(50931), sin_addr=inet_addr("70.129.57.218")}, [17565727405195132944]) = 1283
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(4173), sin_addr=inet_addr("68.188.149.222")}, [5552938349137756176]) = 2224
accept(9, {sa_family=AF_INET, sin_port=htons(1122), sin_addr=inet_addr("76.127.204.185")}, [7062770124213714960]) = 1283
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(13, {sa_family=AF_INET, sin_port=htons(51062), sin_addr=inet_addr("10.18.2.10")}, [8558809625430851600]) = 2292
accept(13, 0x7fffc3d5ca30, [16])        = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(1646), sin_addr=inet_addr("61.145.215.253")}, [7928024202622271504]) = 2292
accept(9, {sa_family=AF_INET, sin_port=htons(50932), sin_addr=inet_addr("70.129.57.218")}, [17637784999233060880]) = 2310
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(4001), sin_addr=inet_addr("99.13.103.221")}, [11605494773346992144]) = 507
accept(9, {sa_family=AF_INET, sin_port=htons(4004), sin_addr=inet_addr("99.13.103.221")}, [11821667555460775952]) = 507
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(1131), sin_addr=inet_addr("76.127.204.185")}, [7711288470555066384]) = 1283
accept(9, {sa_family=AF_INET, sin_port=htons(2143), sin_addr=inet_addr("202.156.50.129")}, [6847723242006773776]) = 1894
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(2146), sin_addr=inet_addr("202.156.50.129")}, [7063896024120557584]) = 1894
accept(9, {sa_family=AF_INET, sin_port=htons(1779), sin_addr=inet_addr("222.165.66.136")}, [17511684209666686992]) = 1278
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(61674), sin_addr=inet_addr("99.230.169.52")}, [16929031007875629072]) = 1278
accept(9, {sa_family=AF_INET, sin_port=htons(61679), sin_addr=inet_addr("99.230.169.52")}, [17289318978065268752]) = 1278
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(1780), sin_addr=inet_addr("222.165.66.136")}, [17583741803704614928]) = 1913
accept(9, {sa_family=AF_INET, sin_port=htons(52495), sin_addr=inet_addr("190.40.52.74")}, [1138566289384538128]) = 2310
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(13, {sa_family=AF_INET, sin_port=htons(51181), sin_addr=inet_addr("10.18.2.10")}, [17133663315944275984]) = 2310
accept(9, {sa_family=AF_INET, sin_port=htons(60995), sin_addr=inet_addr("88.105.44.33")}, [4894849853588242448]) = 2310
accept(9, {sa_family=AF_INET, sin_port=htons(3636), sin_addr=inet_addr("71.64.15.79")}, [3750935548236136464]) = 2310
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(13, {sa_family=AF_INET, sin_port=htons(51273), sin_addr=inet_addr("10.18.2.10")}, [5316499368700805136]) = 2334
accept(13, 0x7fffc3d5ca30, [16])        = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(52498), sin_addr=inet_addr("190.40.52.74")}, [1354739071498321936]) = 2334
accept(9, {sa_family=AF_INET, sin_port=htons(1440), sin_addr=inet_addr("24.21.117.196")}, [11530622429541957648]) = 2334
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(9, {sa_family=AF_INET, sin_port=htons(1442), sin_addr=inet_addr("24.21.117.196")}, [11674737617617813520]) = 2334
accept(9, {sa_family=AF_INET, sin_port=htons(60996), sin_addr=inet_addr("88.105.44.33")}, [4966907447626170384]) = 2336
accept(9, 0x7fffc3d5ca30, [16])         = -1 EAGAIN (Resource temporarily unavailable)
accept(13, {sa_family=AF_INET, sin_port=htons(51345), sin_addr=inet_addr("10.18.2.10")}, [10504646139431616528]) = 2337
accept(13, {sa_family=AF_INET, sin_port=htons(51348), sin_addr=inet_addr("10.18.2.10")}, [10720818921545400336]) = 2337

The only noticeable difference is the frequency of 
--- SIGCHLD (Child exited) @ 0 (0) ---
but it occurred in the normal state as well.
The server sends emails via child subprocesses, so I suspect Child exited may be due to that.

If that's the case, the strace output is identical, and I do not understand what's causing the problem at all.


> -----Original Message-----
> From: twisted-python-bounces at twistedmatrix.com [mailto:twisted-python-
> bounces at twistedmatrix.com] On Behalf Of Jean-Paul Calderone
> Sent: Wednesday, September 10, 2008 2:47 PM
> To: Twisted general discussion
> Subject: RE: [Twisted-Python] intermittent problem: not accepting new
> connections
> 
> On Wed, 10 Sep 2008 14:01:27 -0700, Alec Matusis <matusis at yahoo.com>
> wrote:
> >> Do non-error log messages continue to appear in the Twisted log?
> ie,
> >> is
> >> it clear that the logging system is still working, or could it have
> >> failed
> >> in some way, obscuring any exception reports?
> >
> >Yes, I print the results of garbage collection every 10 min like this:
> >
> >print "number of objects tracked by the garbage collector is:",
> len(gc.get_objects())
> >
> >and they appear in the main twisted log.
> >
> >2008/09/10 13:21 -0700 [-] number of objects tracked by the garbage
> collector is: 860021
> >2008/09/10 13:31 -0700 [-] number of objects tracked by the garbage
> collector is: 864316
> >
> >> Any new unhandled errno values should definitely result in an
> exception
> >> being logged (notice that the `raise´ which follows the checks for
> >> various errno values is inside a try/except which logs any
> exception).
> >
> >I noticed that raise too... Could it then be EWOULDBLOCK, EAGAIN or
> EPERM?
> >
> >                except socket.error, e:
> >                    if e.args[0] in (EWOULDBLOCK, EAGAIN):
> >                        self.numberAccepts = i
> >                        break
> >                    elif e.args[0] == EPERM:
> >                        # Netfilter on Linux may have rejected the
> >                        # connection, but we get told to try to
> accept()
> >                        # anyway.
> >                        continue
> >
> >
> >I am not sure how to debug this problem- I have another twisted server
> of a different type on that machine, and while the problematic server
> stops accepting connections, the second one works just fine, so this is
> not a machine-wide issue.
> >What could it be?
> 
> You could try running strace against the server while it's not
> accepting
> connections.  You can limit the output to just accept calls, too:
> 
>    strace -p <pid> -etrace=accept
> 
> Maybe this will show you that accept calls are happening but failing in
> some way.  If it produces no output, you can widen it and see if you
> can
> tell what the server is doing if it isn't calling accept().
> 
> Jean-Paul
> 
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python





More information about the Twisted-Python mailing list