Opened 7 years ago

Last modified 9 months ago

#6373 defect new

Adding a new filedescriptor to epoll reactor should not result in ENOENT

Reported by: Itamar Turner-Trauring Owned by: Itamar Turner-Trauring
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch: branches/resurrected-fd-6373
branch-diff, diff-cov, branch-cov, buildbot
Author: itamarst

Description

In certain mysterious situations (see #6346) info from an old filedescriptor remains in the reactor state; this can break adding a new filedescriptor with the same fd number, e.g. resulting in a ENOENT.

This ticket covers dealing gracefully with this problem, effectively a workaround that mitigates the original issue (still being covered by #6346). A workaround is useful even if we solve #6346 insofar as it makes the reactor more robust against future bugs.

Change History (7)

comment:1 Changed 7 years ago by itamarst

Author: itamarst
Branch: branches/resurrected-fd-6373

(In [37532]) Branching to 'resurrected-fd-6373'

comment:2 Changed 7 years ago by Itamar Turner-Trauring

Keywords: review added

http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/resurrected-fd-6373 is running.

I punted the poll issue, discovered by the new test, to #6374.

comment:3 Changed 7 years ago by therve

Keywords: review removed
Owner: set to Itamar Turner-Trauring

Comments:

  • Maybe we should share some more code with poll. Ticket?
  • Maybe remove should catch the same problem. TIcket?
  • "close()ed" is really weird, I don't know if it's intentional. You should not use "should", too.
  • The removeReader call at the end of the test is most likely unneeded
  • The new test is failing for kqueue and cfreactor.
  • twistedchecker is complaining:
    ************* Module twisted.internet.epollreactor
    C0301:239,0: Line too long (84/79)
    ************* Module twisted.internet.test.test_fdset
    W9015:379,0: Too many blank lines, expected 2
    

comment:4 Changed 6 years ago by Itamar Turner-Trauring

kqueue and cfreactor can also be addressed by separate tickets, and until then just skip the test on those reactors.

comment:5 Changed 9 months ago by msdemlei

Looks like I'm running into a variant of this. After ~two weeks or so, a reasonably lively twisted.web instance starts throwing IOErrors in self._poller.modify -- and that's it, connections are refused and just yield IOErrors in the server log; that is on Debian stretch twisted, python-twisted 16.6.0, on python2.7 (yeah, I know).

The patch proposed here has never been merged, apparently for concerns it will hide bugs (or so I read the comment in epollreactor). Still, it also sucks if something is failing hard after running for two weeks without a clear way to debug things -- if this is really due to a stale file descriptor left lying around, the triggering code can have run at any time in the previous two weeks, right?

So:

(a) any chance something like this patch might go in after all? At least there'll be a log entry about the failure, so it's not that a bug becomes totally invisible.

(b) is there a good strategy to debug a situation in which such a bug is triggered?

PS: my tracebacks look like this:

Traceback (most recent call last):
	  File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 86, in callWithContext
	    return context.call({ILogContext: newCtx}, func, *args, **kw)
	  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
	    return self.currentContext().callWithContext(ctx, func, *args, **kw)
	  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
	    return func(*args,**kw)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 597, in _doReadOrWrite
	    why = selectable.doRead()
	--- <exception caught here> ---
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 1072, in doRead
	    transport = self.transport(skt, protocol, addr, self, s, self.reactor)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 789, in __init__
	    self.startReading()
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/abstract.py", line 434, in startReading
	    self.reactor.addReader(self)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/epollreactor.py", line 110, in addReader
	    EPOLLIN, EPOLLOUT)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/epollreactor.py", line 94, in _add
	    self._poller.modify(fd, flags)
	exceptions.IOError: [Errno 2] No such file or directory

comment:6 Changed 9 months ago by Jean-Paul Calderone

It looks like there is some unaddressed review feedback. The next steps would be to rescue the branch in question into git/github, address the review feedback, and re-submit the ticket for review.

comment:7 in reply to:  6 Changed 9 months ago by msdemlei

Replying to Jean-Paul Calderone:

It looks like there is some unaddressed review feedback. The next steps would be to rescue the branch in question into git/github, address the review feedback, and re-submit the ticket for review.

I'm afraid you already essentially lost me at the "rescue branch into github" part -- as in: is there a convenient way to do that or would I need to produce a PR from my end?

On the other hand, my impression was that the code patched has changed so much in the last six years that not much would be left of the patch after adapting it to the current code.

Anyway, Itamar, if you're reading this, do you remember why you abandoned the ticket?

Note: See TracTickets for help on using tickets.