[Twisted-Python] new epoll error after upgrading to 9.0.0
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Thu Feb 11 21:54:37 EST 2010
On 12:16 am, matusis at yahoo.com wrote:
>I upgraded to 9.0.0 and I am now seeing a new error, not present in
>8.2.0 or
>earlier:
>
>[snip]
>"/usr/local/encap/python-2.6.4/lib/python2.6/site-
>packages/Twisted-9.0.0-py2
>.6-linux-x86_64.egg/twisted/internet/abstract.py", line 267, in
>stopWriting
> self.reactor.removeWriter(self)
> File
>"/usr/local/encap/python-2.6.4/lib/python2.6/site-
>packages/Twisted-9.0.0-py2
>.6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 145, in
>removeWriter
> self._remove(writer, self._writes, self._reads,
>self._selectables, _epoll.OUT, _epoll.IN)
> File
>"/usr/local/encap/python-2.6.4/lib/python2.6/site-
>packages/Twisted-9.0.0-py2
>.6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 131, in
>_remove
> self._poller._control(cmd, fd, flags)
> File "_epoll.pyx", line 125, in _epoll.epoll._control
>
> exceptions.IOError: [Errno 2] No such file or directory
>
>The error is highy intemittent and occurs only under high connection
>client
>rate. Any idea of what this could be?
Translating into English, a descriptor being monitored for writeability
is being removed from the reactor, but epoll thinks it isn't being
monitored in the first place.
It seems likely this is caused by an attempt to double remove something.
However, why that would happen will probably take a bit more digging.
There was one direct change to epollreactor.py between 8.2 and 9.0:
http://twistedmatrix.com/trac/changeset/26118#file1
It was to reactor shutdown code, though, so it seems like it probably
isn't coming in to play in your case.
A number of other indirect changes were made, though (eg to the epoll
reactor's base classes or other support code it uses). It's conceivable
one of these introduced the problem. One could also imagine that the
problem existed all along, and one of the changes merely nudged some
race condition and now it's going badly for your app.
As far as suggestions for how to track this down go...
Well, minimizing the example is always nice. ;) Aside from that, one
idea that presents itself to me is to instrument the reactor to record
addWriter/removeWriter events, and then log the complete stream of them
for a particular writer when a double removeWriter is attempted.
Initially you might just track that they happen, and use the result to
confirm or reject the double removeWriter hypothesis. If it holds up,
it might be useful to add stack recording, in order to see why things
are happening.
It may even be easy to implement this as a tiny reactor wrapper, which
would make it easier to deploy and enable/disable. If this doesn't
disrupt your production environment overly, it might be worth trying.
Keep us updated. :)
Jean-Paul
More information about the Twisted-Python
mailing list