[Twisted-Python] new epoll error after upgrading to 9.0.0

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Thu Feb 11 19:54:37 MST 2010


On 12:16 am, matusis at yahoo.com wrote:
>I upgraded to 9.0.0 and I am now seeing a new error, not present in 
>8.2.0 or
>earlier:
>
>[snip]
>"/usr/local/encap/python-2.6.4/lib/python2.6/site- 
>packages/Twisted-9.0.0-py2
>.6-linux-x86_64.egg/twisted/internet/abstract.py", line 267, in 
>stopWriting
>            self.reactor.removeWriter(self)
>          File
>"/usr/local/encap/python-2.6.4/lib/python2.6/site- 
>packages/Twisted-9.0.0-py2
>.6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 145, in
>removeWriter
>            self._remove(writer, self._writes, self._reads,
>self._selectables, _epoll.OUT, _epoll.IN)
>          File
>"/usr/local/encap/python-2.6.4/lib/python2.6/site- 
>packages/Twisted-9.0.0-py2
>.6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 131, in 
>_remove
>            self._poller._control(cmd, fd, flags)
>          File "_epoll.pyx", line 125, in _epoll.epoll._control
>
>        exceptions.IOError: [Errno 2] No such file or directory
>
>The error is highy intemittent and occurs only under high connection 
>client
>rate. Any idea of what this could be?

Translating into English, a descriptor being monitored for writeability 
is being removed from the reactor, but epoll thinks it isn't being 
monitored in the first place.

It seems likely this is caused by an attempt to double remove something. 
However, why that would happen will probably take a bit more digging.

There was one direct change to epollreactor.py between 8.2 and 9.0:

  http://twistedmatrix.com/trac/changeset/26118#file1

It was to reactor shutdown code, though, so it seems like it probably 
isn't coming in to play in your case.

A number of other indirect changes were made, though (eg to the epoll 
reactor's base classes or other support code it uses).  It's conceivable 
one of these introduced the problem.  One could also imagine that the 
problem existed all along, and one of the changes merely nudged some 
race condition and now it's going badly for your app.

As far as suggestions for how to track this down go...

Well, minimizing the example is always nice. ;)  Aside from that, one 
idea that presents itself to me is to instrument the reactor to record 
addWriter/removeWriter events, and then log the complete stream of them 
for a particular writer when a double removeWriter is attempted. 
Initially you might just track that they happen, and use the result to 
confirm or reject the double removeWriter hypothesis.  If it holds up, 
it might be useful to add stack recording, in order to see why things 
are happening.

It may even be easy to implement this as a tiny reactor wrapper, which 
would make it easier to deploy and enable/disable.  If this doesn't 
disrupt your production environment overly, it might be worth trying.

Keep us updated. :)

Jean-Paul




More information about the Twisted-Python mailing list