Opened 12 years ago

Closed 12 years ago

#4352 defect closed fixed (fixed)

twisted.test.test_process.ProcessTestCase.testStdio and pollreactor don't get along

Reported by: Jean-Paul Calderone Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: jknight Branch: branches/poll-stdio-halfclose-4352
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun


If the helper program used by testStdio is changed to use pollreactor:

Index: twisted/test/
--- twisted/test/     (revision 28664)
+++ twisted/test/     (working copy)
@@ -10,6 +10,8 @@
 sys.path.insert(0, os.curdir)
 ### end of preamble
+from twisted.internet import pollreactor
 from twisted.python import log
 from zope.interface import implements

then the test fails:

[FAIL]: twisted.test.test_process.ProcessTestCase.testStdio

Traceback (most recent call last):
  File "/home/exarkun/Projects/Twisted/trunk/twisted/test/", line 501, in processEnded
    "%s\n" % (p.outF.getvalue(), p.errF.getvalue()))
twisted.trial.unittest.FailTest: Output follows:

Error message from process_twisted follows:
2010-03-05 22:46:52-0500 [-] Log opened.
2010-03-05 22:46:52-0500 [-] connection made
2010-03-05 22:46:52-0500 [-] connectionLost [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
2010-03-05 22:46:52-0500 [-] ]
2010-03-05 22:46:52-0500 [-] Main loop terminated.

not equal:
a = ''
b = 'hello, worldabc123'


The child receives the bytes written to it, but apparently it shuts down the reactor too quickly and the parent process never receives the results.

I'm not sure if this is just a test bug or if it's actually a misbehavior of the reactor. The failing test is blocking #2234, though, which aims to make poll the default reactor on some platforms.

Change History (15)

comment:1 Changed 12 years ago by Jean-Paul Calderone

Cc: jknight added

So what the test does is basically this:

  1. Spawns a child process
  2. Writes to the child's stdin
  3. Closes the child's stdin
  4. Collects the child's stdout
  5. Makes sure it's the expected value

Meanwhile the child:

  1. Reads from its stdin and echos it back to its stdout
  2. When it gets readConnectionLost it calls loseConnection on its transport
  3. When it gets connectionLost it calls reactor.stop()

When the child process uses select, this works because when stdin is closed, SelectReactor._doReadOrWrite in the child process knows that it is monitoring the file descriptor for read events. It passes True for inRead to _disconnectSelectable. This, along with a reason of ConnectionDone (which correctly returns from Connection.doRead when it discovers a readable socket with no more bytes to read) results in readConnectionLost being called on the protocol, rather than merely connectionLost.

When the child process uses poll, this fails because when stdin is closed, it has only POLLHUP set and PollReactor._doReadOrWrite doesn't pay attention to anything else when deciding whether to set inRead to True or False. So it passes False to _disconnectSelectable. Plus, it treats POLLHUP as an an error condition and uses ConnectionLost as the reason, rather than ConnectionDone. Each of these behaviors by itself is sufficient to cause _disconnectSelectable to call connectionLost rather than readConnectionLost. This results in the reactor being stopped earlier, presumably leading to the bytes being left in the Twisted send buffer rather than delivered to the parent.

So this is a real pollreactor bug - stdio halfclose is not properly supported (#1742). I think it is not a test bug, because transport.write is supposed to be reliable (as much as possible given by the underlying transport), even if you call transport.loseConnection before the I/O has actually taken place.

comment:2 Changed 12 years ago by Jean-Paul Calderone

Oops. I said Connection.doRead takes care of turning an empty read into a ConnectionDone. This is stdio, so actually it's ProcessReader.doRead, which actually just delegates to twisted.internet.fdesc.readFromFD. However, that does the same thing as Connection.doRead, so the conclusion remains the same.

I also wonder why POLLHUP comes back from the poll(2) call. The man page says (output only) for this flag. I suppose pipes are special, as usual.

comment:3 Changed 12 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/poll-stdio-halfclose-4352

(In [28665]) Branching to 'poll-stdio-halfclose-4352'

comment:4 Changed 12 years ago by Jean-Paul Calderone

(In [28666]) Add a halfclose stdio test

refs #4352 refs #1742

comment:5 Changed 12 years ago by Jean-Paul Calderone

(In [28668]) Fix the new test in pollreactor and epollreactor

refs #4352 refs #1742

comment:6 Changed 12 years ago by Jean-Paul Calderone

pollreactor, epollreactor, and gtk2reactor all had variations of this bug. I fixed them all in similar ways. Unfortunately there's a lot of duplication. :/

I expect to close #1742 when this branch is merged, too.

Build results

comment:7 Changed 12 years ago by jknight

I'm kinda wondering if it's actually reasonable to have generic behavior in *poll reactors for all fd types, given how differently different object types behave.

comment:8 Changed 12 years ago by Jean-Paul Calderone

That's an interesting line of consideration. It seems to make a lot of sense. If we moved this logic into the IReadDescriptor and IWriteDescriptor implementations, then the pipe-based implementations could be responsible for knowing about this quirk.

One challenge with making this change is that IReadDescriptor and IWriteDescriptor aren't expressive enough to allow the reactor to tell them about everything going on at the poll(2) level. I'm also not exactly sure how we would widen this interface, except in the way which exactly accommodates this particular case. Hmm. Although since it works with SelectReactor, maybe that's not really true. If we just called doRead when poll told us POLLHUP then maybe everything would work out okay...

On the other hand, it's clearly a bug in all of the affected reactor implementations that they were passing inRead = False to _disconnectSelectable in this case. So that part of the fix should remain.

comment:9 Changed 12 years ago by Jean-Paul Calderone

Well, all the tests still pass with this change made to the branch:

  • twisted/internet/

    205205        if event & _POLL_DISCONNECTED and not (event & _epoll.IN):
    206206            if fd in self._reads:
    207207                inRead = True
    208                 why = CONNECTION_DONE
     208                why = selectable.doRead()
    209209            else:
    210210                why = CONNECTION_LOST
    211211        else:
  • twisted/internet/

    171171        inRead = False
    172172        if event & POLL_DISCONNECTED and not (event & POLLIN):
    173173            if fd in self._reads:
    174                 why = main.CONNECTION_DONE
    175174                inRead = True
     175                why = selectable.doRead()
    176176            else:
    177177                why = main.CONNECTION_LOST
    178178        else:
  • twisted/internet/

    278278        inRead = False
    279279        if condition & POLL_DISCONNECTED and not (condition & gobject.IO_IN):
    280280            if source in self._reads:
    281                 why = main.CONNECTION_DONE
    282281                inRead = True
     282                why = source.doRead()
    283283            else:
    284284                why = main.CONNECTION_LOST
    285285        else:

but it's not much of a simplification. I might prefer to wait to make the change until there's a concrete benefit to doing so. I dunno.

comment:10 Changed 12 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Glyph deleted

comment:11 Changed 12 years ago by therve

Keywords: review removed
Owner: set to Jean-Paul Calderone
  • Pyflakes: twisted/test/ 'ConnectionDone' imported but unused
  • the test file is also missing a couple of blank lines.
  • one thing that the gtk2reactor code does better than the others is only set inRead to True, and keep the default value otherwise. It would be nice to change poll/epoll to do the same.

Otherwise it looks good. The duplication is unfortunate, but I don't want to think about that now. Please merge!

comment:12 Changed 12 years ago by Jean-Paul Calderone

(In [28678]) Remove unused import; add a class docstring to HalfCloseProtocol; insert some missing vertical whitespace

refs #4352

comment:13 Changed 12 years ago by Jean-Paul Calderone

Thanks for the review. I fixed the first two points. For the third, I think this may actually be a bug in gtk2reactor. It's hard to tell, since no tests exercise it: but if a selectable is both readable and writeable, and the write signals an error, then probably inRead should be set back to False, because a write error must mean the connection is completely gone, not half-closed.

I'm going to leave the relevant code alone for now, but file a ticket for refactoring this duplication and determining whether this is really a bug or if the refactored implementation can do what gtk2reactor currently does.

comment:14 Changed 12 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [28679]) Merge poll-stdio-halfclose-4352

Author: exarkun Reviewer: therve Fixes: #1742 Fixes: #4352

Fix support for stdio halfclose in pollreactor, epollreactor, and gtk2reactor. Also add a test for stdio halfclose that any reactor can run.

comment:15 Changed 11 years ago by <automation>

Owner: Jean-Paul Calderone deleted
Note: See TracTickets for help on using tickets.