[Twisted-Python] epoll reactor problems

Alec Matusis matusis at matusis.com
Wed Apr 11 14:38:02 EDT 2007


I am now suspecting there is something strange with Twisted 2.5:

Both servers now show 99.9% CPU using top:

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
6929 alecm     16   0  148m  82m 3588 R 99.9  2.1 414:29.24 twistd

4083 alecm     17   0  135m  70m 3640 R 99.9  1.8 421:00.05 twistd

The first one 6929 is using poll now, 4083 is using epoll. For this time of
the day with Twisted 2.2 and poll the usual CPU would be 15% for 4083 and
80% for 6229. These numbers were pretty stable every day over at leats a
month.
Both servers seem to be responsive however. There were no code changes.

ps shows different %CPU at the same time:

#ps -C twistd -o pcpu,cmd,pid,size
%CPU CMD                           PID    SZ
61.0 /usr/bin/python /usr/local/  4083 82240
66.5 /usr/bin/python /usr/local/  6929 97300

I wonder if we should revert to Twisted 2.2 ASAP


> -----Original Message-----
> From: twisted-python-bounces at twistedmatrix.com [mailto:twisted-python-
> bounces at twistedmatrix.com] On Behalf Of Thomas Hervé
> Sent: Wednesday, April 11, 2007 4:47 AM
> To: twisted-python at twistedmatrix.com
> Subject: RE: [Twisted-Python] epoll reactor problems
> 
> Quoting Alec Matusis <matusis at matusis.com>:
> 
> >> That's old (debian stable ? :)). I don't say that'll solve your
> >> problem, but you
> >> could try with 2.4.4 (warning, not 2.4.3).
> >
> > It's SuSE stable ;-) Our stuff on that machine is pretty convoluted
> now, so
> > we will probably have a chance to test with 2.4.4 only in a week,
> when we
> > add a brand new server with 2.4.4.
> 
> OK. That is just another thing to try, I don't see obvious reasons why
> it could
> work better on 2.4.4, but...
> 
> > I noticed a difference between this from the 99.9% CPU server:
> >
> > epoll_wait(4, {{EPOLLERR|EPOLLHUP, {u32=423,
> u64=12304606485815493031}},
> >
> {EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDNORM|EPOLLRDBAND|EPOLLWRNORM|EPOLLWRB
> AND|E
> > POLLMSG|EPOLLERR|EPOLLHUP|0x7820, {u32=5529648, u64=5529648}},
> > {EPOLLIN|EPOLLPRI|EPOLLRDNORM|EPOLLRDBAND|EPOLLMSG|0x1000, {u32=0,
> > u64=22827751178240}}, {0, {u32=0, u64=0}},
> > {EPOLLOUT|EPOLLERR|EPOLLONESHOT|EPOLLET|0x3fffa820, {u32=32767,
> > u64=18097643565645823}}}, 1432, 68) = 5
> >
> > and this from a normal server running at 5% CPU:
> >
> > epoll_wait(4, {{EPOLLIN, {u32=1769, u64=12304606485815494377}}, {0,
> > {u32=4294944684, u64=140737488332716}}}, 1728, 17) = 2
> >
> > What does this mean?
> 
> The flags set on your sockets are generally EPOLLIN or EPOLLOUT: data
> to read or
> available for write. I don't know much about the other flags. EPOLLERR
> is set if
> the fd has been closed for example. EPOLLET is *highly* suspect,
> because it
> should only be there if set in the user code. The documentation of
> other flags
> is really terse...
> 
> 
> >> What's the global state of the process? Memory, number of opened fd
> ?
> >
> > We immediately reverted to poll, so I do not have it in front of me.
> The RSS
> > size was 45MB, and the number of open fd I do not know: it should
> have been
> > about 1500, but I did not check.
> 
> Hum... it may come from running out of file descriptors, so you'd
> better check
> your settings for this.
> 
> > I can do another test run with epoll in about 20hrs, since I do not
> want to
> > upset users too much.
> 
> Of course :).
> 
> > If you have some specific data I should get from the
> > test run, please let me know now.
> 
> Every information would be useful. The most useful information would be
> to know
> when it begins to act strangely, and if there is something that happend
> at this
> moment. Otherwise, number of fds, memory, netstat output, strace
> output...
> 
> --
> Thomas
> 
> 
> 
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python





More information about the Twisted-Python mailing list