Opened 4 years ago

Closed 3 years ago

#4539 defect closed fixed (fixed)

fileno() gets called on a ReadDescriptor after removing it from the reactor

Reported by: wulczer Owned by: exarkun
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch: branches/bad-fileno-call-4539-2
(diff, github, buildbot, log)
Author: exarkun Launchpad Bug:

Description

The attached test case fails on HEAD twisted.

The use case is an object that wants to implement the IReadDescriptor interface and is getting added and removed to the reactor using reactor.addReader() and .removeReader().

More specifically, it is added to the reactor and after it becomes no longer usable (say, it has been closed by the user, or disposed in another way) and at which point fileno() can no longer return sensible values, it is removed from the reactor.

However fileno() still can get called on the object even after removing it from the reactor, hence the need to return, lacking any sensible values, -1 (the reason returning -1 works is explained in #2825, I believe). Since handling a -1 fileno is considered a crock (and rightfully so), this has to be considered a defect.

Attachments (1)

test_foo.py (1.0 KB) - added by wulczer 4 years ago.
failing test case

Download all attachments as: .zip

Change History (25)

Changed 4 years ago by wulczer

failing test case

comment:1 Changed 4 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/bad-fileno-call-4539

(In [29610]) Branching to 'bad-fileno-call-4539'

comment:2 Changed 4 years ago by exarkun

  • Keywords review added
  • Owner glyph deleted

I deleted the code trying to handle -1s.

Build results

It's possible I should have changed something in abstract.FileDescriptor.fileno as well, but... what?

comment:3 Changed 4 years ago by itamar

  • Keywords review removed
  • Owner set to exarkun

I agree that the tests for -1 or non-matching are unnecessary - that's a seeming duplication of what the return value from doRead/doWrite is for.

How about having abstract.FileDescriptor.fileno() raise a RuntimeError("%s does not have a file descriptor." % (self,))? Worst case buggy code will get this error logged, the FileDescriptor will be removed, and all will be well.

Also, the test fails on Windows, according to the build results, in an interesting way that may inform the design discussion (http://buildbot.twistedmatrix.com/builders/winxp32-py2.6-select/builds/590/steps/select/logs/problems)

comment:4 Changed 4 years ago by exarkun

  • Keywords review added
  • Owner changed from exarkun to glyph

I agree that the tests for -1 or non-matching are unnecessary - that's a seeming duplication of what the return value from doRead/doWrite is for.

We discussed this on IRC at length. foom proposed the notion that anything we can do to avoid weird file descriptor (particularly accidental re-use-style) bugs, or make them easier to track down, is a good thing. This, combined with the weight of the backwards compatibility argument, suggests to me that we should just leave the -1 handling in place, at least for the time being.

How about having abstract.FileDescriptor.fileno() raise a RuntimeError("%s does not have a file descriptor." % (self,))? Worst case buggy code will get this error logged, the FileDescriptor will be removed, and all will be well.

I started on this. Needs tests, though. Ended up backing out the changes to try to limit the damage here. Not clear I succeeded in that strategy, but I left the changes out anyway. I have them in a patch file on my hard drive though.

Also, the test fails on Windows, according to the build results,

Fixed. Was pretty boring. Just caused by using pipes instead of sockets.

So, issues that are confusing me:

  • IFileDescriptor is incompletely specified. Can fileno return -1? Can it raise an exception? Can it change its return value between two positive integers over the lifetime of the instance?
  • IReadDescriptor.doRead and IWriteDescriptor.doWrite are incompletely specified. What do their return values mean?
  • Four nearly identical copies of _doReadOrWrite in four different reactors.
  • The new test_fdset tests seems kind of sucky to me.
  • untested, potentially bogus, error handling code sprinkled throughout

Clearly the branch is unfinished. But perhaps you can give me some direction.

comment:5 Changed 4 years ago by itamar

  • Keywords review removed
  • Owner changed from glyph to exarkun
  1. I guess we should be defining the interfaces for IFileDescriptor and IRead/IWriteDescriptor better, by expanding the interface docstrings to be complete. Whatever abstract.FileDescriptor subclasses do now should probably be the norm (so e.g. doWrite or doRead can return None or a Failure indicating disconnection, IIRC), but see below on fileno. In general I'd say that the interface should be restrictive in what it allows, whereas the reactor that accepts it should be lenient in what it accepts (although logging complaints against badly written descriptors seems fine).
  1. As for fileno: changing return value between two integers? Seems like bad idea, and so should be disallowed. -1 seems like a hack, so I'd say should be disallowed.

How about this: when fileno has no fd, e.g. is closed, it must raise ConnectionFdescWentAway. abstract.FileDescriptor.fileno should do this by default. -1 should still be recognized by reactor as meaning disconnection, but should result in deprecation warning. You could additionally check for changed fd from fileno, which should presumably result in error being logged.

  1. Merging the _doReadOrWrite functions is a great idea.
  1. Checking for fileno first is a good idea - slightly less efficient, but does seem more likely to catch messed up FileDescriptors.
  1. test_lostFileDescriptor: "that means epollreactor will call any methods" - was this supposed to be "will *not* call"?
  1. Can you explain a bit more what you changed in selectreactor? AFAICT just removed the while loop, and the fileno() != -1 test which we determined was unnecessary?
  1. What bothers you about new tests?

comment:6 Changed 4 years ago by exarkun

  • Keywords review added
  • Owner changed from exarkun to glyph
  1. I guess we should be defining the interfaces for IFileDescriptor and IRead/IWriteDescriptor better

Yea. I went ahead and did that in the branch.

  1. As for fileno: changing return value between two integers? Seems like bad idea, and so should be disallowed. -1 seems like a hack, so I'd say should be disallowed.

I agree with disallowing -1, but I think we should leave it for now.

  1. test_lostFileDescriptor: "that means epollreactor will call any methods" - was this supposed to be "will *not* call"?

Yes, thanks, fixed.

  1. Can you explain a bit more what you changed in selectreactor? AFAICT just removed the while loop, and the fileno() != -1 test which we determined was unnecessary?

Exactly correct. The loop in doSelect just saved us a trip back through runUntilCurrent in the error case. This version will be a little slower, but it's also more correct, because it will actually let the reactor stop if you call reactor.stop() in the connectionLost method of a selectable that triggers this error handling behavior.

And the fileno() != -1 check wasn't necessary because select performs the same check for us.

  1. What bothers you about new tests?

They seem complicated and fragile. Although I'm feeling a little better about them now. Maybe they have to be complicated, because the reactor guts are dealing with complicated cases.

Thanks for the review.

comment:7 Changed 4 years ago by itamar

  • Keywords review removed
  • Owner changed from glyph to exarkun
  1. Missing news file.
  1. The way the _doReadOrWrite function decides between CONNECTION_LOST and CONNECTION_DONE is... strange. At the very least there should be an explanatory comment -- however, if you can't give a good answer, maybe you should just open a new ticket for that.
  1. I'm still not thrilled with supporting -1... but it does seem to be the actual current API. Perhaps you should also open a ticket for a new, better FileDescriptor/reactor integration API, if one doesn't exist (which would include James' proposal for a general registration API separate from addReader/addWriter).
  1. test_negativeOneFileDescriptor has no assertions. You could for example assert that only the correct byte values were received, which would demonstrate doRead was called appropriate number of times.
  1. What happens if you have a test like test_lostFileDescriptor, but you additionally add the recreated descriptor to the reactor?

comment:8 Changed 4 years ago by exarkun

  • Keywords review added
  • Owner changed from exarkun to glyph
  1. Missing news file.

Yea, I'll leave that for later, once I actually know what the branch is going to do. ;)

  1. The way the _doReadOrWrite function decides between CONNECTION_LOST and CONNECTION_DONE is... strange.

I guess so. But it's correct, as far as I know. Possibly it needs better test coverage, I'm not sure.

  1. test_negativeOneFileDescriptor has no assertions.

Added one.

  1. What happens if you have a test like test_lostFileDescriptor, but you additionally add the recreated descriptor to the reactor?

You probably won't be able to most of the time. The old descriptor will still have a key in the applicable internal mapping (_reads or _writes mostly) so the addReader/addWriter call will just be ignored.

comment:9 Changed 4 years ago by itamar

  • Keywords review removed
  • Owner changed from glyph to exarkun

Following up on same points:

  1. If it makes sense to you, an explanatory comment may be sufficient. I see no explanation the poll manpage at least. (And as aside, looking through poll man page I noticed the POLLNVAL flag, which perhaps we should be utilizing.)
  1. Looks good.
  1. Perhaps we should have a test for adding a FileDescriptor with duplicate fileno to a reactor? The addReader/addWriter should probably fail loudly.

comment:10 Changed 4 years ago by exarkun

  • Keywords review added
  • Owner changed from exarkun to glyph
  1. If it makes sense to you, an explanatory comment may be sufficient.

Well, I think it's right at least. I'll leave aside discussions of its sensibility. ;) I added a comment in r29801.

  1. Perhaps we should have a test for adding a FileDescriptor with duplicate fileno to a reactor? The addReader/addWriter should probably fail loudly.

The trouble is that it's currently silently allowed and I suspect a non-trivial amount of code relies on this. We could try discouraging it for a while with a warning and eventually make it an error, I guess. If that makes sense, I'll file a separate ticket.

comment:11 Changed 4 years ago by exarkun

There was another ticket about the _doReadOrWrite refactoring, #4357.

comment:12 Changed 4 years ago by itamar

  • Keywords review removed
  • Owner changed from glyph to exarkun
  1. Comment helps, but "If we were reading from the descriptor then this is a clean shutdown." - why is this the case?
  1. Erk. Another ticket may be good idea, yes.

comment:13 Changed 4 years ago by itamar

Oh, and a typo in the comments: "selectalbe"

comment:14 Changed 4 years ago by exarkun

Comment helps, but "If we were reading from the descriptor then this is a clean shutdown." - why is this the case?

Well, that's a tricky one, I guess. There is a unit test that fails if I mess around with this code, but only one. I agree it's not directly evident from the man page why this is the case. I tried to demonstrate the correctness of my own understanding for a little while and didn't exactly succeed. Probably we need several more unit tests in this area, but I'm not sure I feel like writing them right now. ;)

comment:15 Changed 4 years ago by itamar

So... another ticket then, I guess (and maybe James knows the answer, he usually does.)

Which means: two new tickets, a typo to fix, and a news file.

comment:16 Changed 4 years ago by exarkun

Filed #4604 and #4605.

comment:17 Changed 4 years ago by exarkun

(In [29898]) handle socket.error in addition to IOError

refs #4539

comment:18 Changed 4 years ago by exarkun

(In [30134]) Attempt to implement the fileno/exception/-1 behavior for win32er

refs #4539

comment:19 Changed 4 years ago by glyph

build results in case somebody wants to pick this up.

comment:20 Changed 4 years ago by <automation>

  • Owner exarkun deleted

comment:21 Changed 3 years ago by exarkun

  • Branch changed from branches/bad-fileno-call-4539 to branches/bad-fileno-call-4539-2

(In [31655]) Branching to 'bad-fileno-call-4539-2'

comment:22 Changed 3 years ago by exarkun

  • Keywords review added

I changed the fileno() == -1 test to skip on win32eventreactor, which never calls fileno on its descriptors. Latest build results look okay.

comment:23 Changed 3 years ago by therve

  • Keywords review removed
  • Owner set to exarkun

This is definitely tricky, but reading the backlog gave me most of the information, I think. Please merge.

comment:24 Changed 3 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [31676]) Merge bad-fileno-call-4539-2

Author: wulczer, exarkun
Reviewer: itamar, therve
Fixes: #4539

Refactor the event dispatch method of all of the poll-like reactors into
mixin shared by all those reactors. Also change that dispatch logic so
that fileno is not called on descriptors after they have removed
themselves from the reactor.

Also change selectreactor to remove the loop inside doSelect and instead
let this looping happen via the main reactor loop. This fixes a bug where
the reactor might fail to stop sometimes.

Note: See TracTickets for help on using tickets.