Opened 4 years ago

Closed 4 years ago

#4352 defect closed fixed (fixed)

twisted.test.test_process.ProcessTestCase.testStdio and pollreactor don't get along

Reported by: exarkun Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: jknight Branch: branches/poll-stdio-halfclose-4352
(diff, github, buildbot, log)
Author: exarkun Launchpad Bug:

Description

If the helper program used by testStdio is changed to use pollreactor:

Index: twisted/test/process_twisted.py
===================================================================
--- twisted/test/process_twisted.py     (revision 28664)
+++ twisted/test/process_twisted.py     (working copy)
@@ -10,6 +10,8 @@
 sys.path.insert(0, os.curdir)
 ### end of preamble
 
+from twisted.internet import pollreactor
+pollreactor.install()
 
 from twisted.python import log
 from zope.interface import implements

then the test fails:

===============================================================================
[FAIL]: twisted.test.test_process.ProcessTestCase.testStdio

Traceback (most recent call last):
  File "/home/exarkun/Projects/Twisted/trunk/twisted/test/test_process.py", line 501, in processEnded
    "%s\n" % (p.outF.getvalue(), p.errF.getvalue()))
twisted.trial.unittest.FailTest: Output follows:

Error message from process_twisted follows:
2010-03-05 22:46:52-0500 [-] Log opened.
2010-03-05 22:46:52-0500 [-] connection made
2010-03-05 22:46:52-0500 [-] connectionLost [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
2010-03-05 22:46:52-0500 [-] ]
2010-03-05 22:46:52-0500 [-] Main loop terminated.


not equal:
a = ''
b = 'hello, worldabc123'

-------------------------------------------------------------------------------

The child receives the bytes written to it, but apparently it shuts down the reactor too quickly and the parent process never receives the results.

I'm not sure if this is just a test bug or if it's actually a misbehavior of the reactor. The failing test is blocking #2234, though, which aims to make poll the default reactor on some platforms.

Change History (15)

comment:1 Changed 4 years ago by exarkun

  • Cc jknight added

So what the test does is basically this:

  1. Spawns a child process
  2. Writes to the child's stdin
  3. Closes the child's stdin
  4. Collects the child's stdout
  5. Makes sure it's the expected value

Meanwhile the child:

  1. Reads from its stdin and echos it back to its stdout
  2. When it gets readConnectionLost it calls loseConnection on its transport
  3. When it gets connectionLost it calls reactor.stop()

When the child process uses select, this works because when stdin is closed, SelectReactor._doReadOrWrite in the child process knows that it is monitoring the file descriptor for read events. It passes True for inRead to _disconnectSelectable. This, along with a reason of ConnectionDone (which tcp.py correctly returns from Connection.doRead when it discovers a readable socket with no more bytes to read) results in readConnectionLost being called on the protocol, rather than merely connectionLost.

When the child process uses poll, this fails because when stdin is closed, it has only POLLHUP set and PollReactor._doReadOrWrite doesn't pay attention to anything else when deciding whether to set inRead to True or False. So it passes False to _disconnectSelectable. Plus, it treats POLLHUP as an an error condition and uses ConnectionLost as the reason, rather than ConnectionDone. Each of these behaviors by itself is sufficient to cause _disconnectSelectable to call connectionLost rather than readConnectionLost. This results in the reactor being stopped earlier, presumably leading to the bytes being left in the Twisted send buffer rather than delivered to the parent.

So this is a real pollreactor bug - stdio halfclose is not properly supported (#1742). I think it is not a test bug, because transport.write is supposed to be reliable (as much as possible given by the underlying transport), even if you call transport.loseConnection before the I/O has actually taken place.

comment:2 Changed 4 years ago by exarkun

Oops. I said Connection.doRead takes care of turning an empty read into a ConnectionDone. This is stdio, so actually it's ProcessReader.doRead, which actually just delegates to twisted.internet.fdesc.readFromFD. However, that does the same thing as Connection.doRead, so the conclusion remains the same.

I also wonder why POLLHUP comes back from the poll(2) call. The man page says (output only) for this flag. I suppose pipes are special, as usual.

comment:3 Changed 4 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/poll-stdio-halfclose-4352

(In [28665]) Branching to 'poll-stdio-halfclose-4352'

comment:4 Changed 4 years ago by exarkun

(In [28666]) Add a halfclose stdio test

refs #4352
refs #1742

comment:5 Changed 4 years ago by exarkun

(In [28668]) Fix the new test in pollreactor and epollreactor

refs #4352
refs #1742

comment:6 Changed 4 years ago by exarkun

pollreactor, epollreactor, and gtk2reactor all had variations of this bug. I fixed them all in similar ways. Unfortunately there's a lot of duplication. :/

I expect to close #1742 when this branch is merged, too.

Build results

comment:7 Changed 4 years ago by jknight

I'm kinda wondering if it's actually reasonable to have generic behavior in *poll reactors for all fd types, given how differently different object types behave.

comment:8 Changed 4 years ago by exarkun

That's an interesting line of consideration. It seems to make a lot of sense. If we moved this logic into the IReadDescriptor and IWriteDescriptor implementations, then the pipe-based implementations could be responsible for knowing about this quirk.

One challenge with making this change is that IReadDescriptor and IWriteDescriptor aren't expressive enough to allow the reactor to tell them about everything going on at the poll(2) level. I'm also not exactly sure how we would widen this interface, except in the way which exactly accommodates this particular case. Hmm. Although since it works with SelectReactor, maybe that's not really true. If we just called doRead when poll told us POLLHUP then maybe everything would work out okay...

On the other hand, it's clearly a bug in all of the affected reactor implementations that they were passing inRead = False to _disconnectSelectable in this case. So that part of the fix should remain.

comment:9 Changed 4 years ago by exarkun

Well, all the tests still pass with this change made to the branch:

  • twisted/internet/epollreactor.py

     
    205205        if event & _POLL_DISCONNECTED and not (event & _epoll.IN): 
    206206            if fd in self._reads: 
    207207                inRead = True 
    208                 why = CONNECTION_DONE 
     208                why = selectable.doRead() 
    209209            else: 
    210210                why = CONNECTION_LOST 
    211211        else: 
  • twisted/internet/pollreactor.py

     
    171171        inRead = False 
    172172        if event & POLL_DISCONNECTED and not (event & POLLIN): 
    173173            if fd in self._reads: 
    174                 why = main.CONNECTION_DONE 
    175174                inRead = True 
     175                why = selectable.doRead() 
    176176            else: 
    177177                why = main.CONNECTION_LOST 
    178178        else: 
  • twisted/internet/gtk2reactor.py

     
    278278        inRead = False 
    279279        if condition & POLL_DISCONNECTED and not (condition & gobject.IO_IN): 
    280280            if source in self._reads: 
    281                 why = main.CONNECTION_DONE 
    282281                inRead = True 
     282                why = source.doRead() 
    283283            else: 
    284284                why = main.CONNECTION_LOST 
    285285        else: 

but it's not much of a simplification. I might prefer to wait to make the change until there's a concrete benefit to doing so. I dunno.

comment:10 Changed 4 years ago by exarkun

  • Keywords review added
  • Owner glyph deleted

comment:11 Changed 4 years ago by therve

  • Keywords review removed
  • Owner set to exarkun
  • Pyflakes: twisted/test/stdio_test_halfclose.py:17: 'ConnectionDone' imported but unused
  • the test file is also missing a couple of blank lines.
  • one thing that the gtk2reactor code does better than the others is only set inRead to True, and keep the default value otherwise. It would be nice to change poll/epoll to do the same.

Otherwise it looks good. The duplication is unfortunate, but I don't want to think about that now. Please merge!

comment:12 Changed 4 years ago by exarkun

(In [28678]) Remove unused import; add a class docstring to HalfCloseProtocol; insert some missing vertical whitespace

refs #4352

comment:13 Changed 4 years ago by exarkun

Thanks for the review. I fixed the first two points. For the third, I think this may actually be a bug in gtk2reactor. It's hard to tell, since no tests exercise it: but if a selectable is both readable and writeable, and the write signals an error, then probably inRead should be set back to False, because a write error must mean the connection is completely gone, not half-closed.

I'm going to leave the relevant code alone for now, but file a ticket for refactoring this duplication and determining whether this is really a bug or if the refactored implementation can do what gtk2reactor currently does.

comment:14 Changed 4 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [28679]) Merge poll-stdio-halfclose-4352

Author: exarkun
Reviewer: therve
Fixes: #1742
Fixes: #4352

Fix support for stdio halfclose in pollreactor, epollreactor, and gtk2reactor.
Also add a test for stdio halfclose that any reactor can run.

comment:15 Changed 3 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.