Opened 11 years ago

Closed 11 years ago

#2371 defect closed fixed (fixed)

twisted.conch.stdio.runWithProtocol writes to non-blocking file descriptor incorrectly

Reported by: Jean-Paul Calderone Owned by:
Priority: highest Milestone:
Component: conch Keywords:
Cc: therve Branch:


At this line:

        os.write(0, "\r\x1bc\r")

0 is in non-blocking mode. This means the write may fail or be incomplete. The return value must be checked or the file descriptor should be set to be blocking.

This may be related to #2123. Or it may not be. Whether the file descriptor is in blocking mode or non-blocking mode, the write succeeds completely for that test; however, when it is in blocking mode, I have not seen the test fail, whereas when it is in non-blocking mode it fails at least 1 time in 10 when run on a heavily loaded OS X machine.

Change History (15)

comment:1 Changed 11 years ago by Jean-Paul Calderone

I also noticed that the change I thought I had committed a while ago still isn't actually in Twisted:

  • twisted/conch/

    5252            self.proto = None
    5454class ConsoleManhole(ColoredManhole):
    55     def handle_QUIT(self):
    56         ColoredManhole.handle_QUIT(self)
     55    def connectionLost(self, reason):
    5756        reactor.stop()
    5958def runWithProtocol(klass):

comment:2 Changed 11 years ago by Jean-Paul Calderone

Keywords: review added
Owner: z3p deleted
Priority: normalhighest

I believe this is fixed in ctrl+\-2371+2123.

As the branch name suggests, this is related to #2123. The three changes in this branch together solve this problem by:

  • not stopping the reactor immediately in response to C-\, but instead waiting for the transport to disconnect and then stopping it. The transport will wait until it has pushed all of its bytes into the kernel before actually disconnecting. This ensures any application-level writes outstanding are processed.
  • setting the output fd to blocking so that the os.write in runWithProtocol will always write the string fully to the kernel.
  • draining the tty after writing the clear screen code before exiting so that the peer is guaranteed to have received it.

Overall, only the first two changes are correct, at best. The third change isn't actually necessary to fix this problem on OS X (only the second change is - but for no reason remotely related to the obvious). Perhaps the tcdrain should be dropped entirely, in fact.

comment:3 Changed 11 years ago by Glyph

I don't think I understand the issue quite well enough to give a review.

I believe the code is in the wrong place, however. It ought to be in twisted.internet._posixstdio, or perhaps even a new special case module just for OS X's particular brokennesses.

A reactor shutdown hook or atexit handler both seem more appropriate to me, and they ought to be hooked from the StandardIO handler because there is still an unresolved bug here, I think.

comment:4 Changed 11 years ago by jknight

Owner: set to jknight
Status: newassigned

comment:5 Changed 11 years ago by Jean-Paul Calderone

Okay I got rid of the tcdrain and moved the setNonBlocking call into the... process... code.

comment:6 Changed 11 years ago by jknight

Owner: changed from jknight to Jean-Paul Calderone
Status: assignednew

Sorry about taking so long...

It looks like there's no test case here. I thought you said you had made a minimal program which showed the same bug, so it'd be good if you could include that here.

Other than that it looks fine.

It would be nice to know the bug can be fixed from the other side (that is, the one doing the reading).

comment:7 Changed 11 years ago by Jean-Paul Calderone

Keywords: review removed

comment:8 Changed 11 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Okay. I've added a simplified version of the failing conch test to twisted.test. It fails about one time in ten on neutron without the fix, and with the fix in a trial -u run I left going overnight, didn't fail after more than 100k iterations.

I took the opportunity to add some test coverage for ProcessProtocol as well, since I ended up fixing an inconsistency between PTY processes and regular processes in the course of writing the above mentioned test.

This also led me to delete SignalMixin, since it has been pointless for a while and was near the code I was changing.

comment:9 Changed 11 years ago by radix

Keywords: review removed
Owner: set to Jean-Paul Calderone

It looks like test_lastWriteReceived could be way more trivial. I see no reason for it to be interactive (with its subprocess) at all; it seems you should just be able to make the subprocess write a single byte on its stdout, and make sure that you receive that byte in the test.

comment:10 Changed 11 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Okay, I simplified test_lastWriteReceived as far as I could. It still fails non-deterministically, but much less frequently now. I put a comment into the test explaining how to make it fail more frequently.

comment:11 Changed 11 years ago by therve

Keywords: review removed
Owner: set to Jean-Paul Calderone

A few comments:

  • test_process needs to be merged forward (sorry...)
  • pyflakes complains about test_mail
  • test_stdio needs to be skipped for windows

comment:12 Changed 11 years ago by Jean-Paul Calderone

Cc: therve added
Keywords: review added
Owner: Jean-Paul Calderone deleted


comment:13 Changed 11 years ago by therve

Keywords: review removed
Owner: set to Jean-Paul Calderone

I think this is mostly good. This a huge branch so I hope I didn't forget anything:

  • The change of Crypto imports doesn't solve anything, because conch/ssh/ still import it. Not really a problem of its own, though.
  • I don't know the policy of copyright statements. I saw you change to '2001-2004,2007', I don't know if '2001-2007' wouldn't be better. Anyhow it should be done consistently in all files.
  • The new docstring of test_iutils.UtilsTestCase is better than nothing, but it would be even better it it mentioned getProcessOutput.

Please merge when OK on all this.

Thanks for the great work !

comment:14 Changed 11 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [19640]) Merge ctrl+backslash-2371+2123-5

Author: exarkun Reviewer: jknight, radix, therve Fixes #2371, #2123, #2378

Remove the long-since obsolete test helper SignalMixin, which set up and tore down special signal handling state. For some time, trial has taken care of this detail. (#2378)

Change the process file descriptor "connection lost" code to reverse the setNonBlocking operation done during initialization, leaving the file descriptor in the blocking state (what it probably was before it was given to the process code). This works around a bug with PTYs on OS X where the last write made to a non-blocking PTY might be lost if the process exits before the data is read from the other end of the PTY. This also corrects a (perhaps unreasonable) assumption that blocking writes may be made to stdout after twisted.internet.stdio has been used. (#2371)

Change ConsoleManhole so that it waits for connectionLost notification before stopping the reactor. This avoids the possibility that the reactor will stop before its transport has had all of its output flushed. Combined with the fix for #2371, this fixes #2123.

comment:15 Changed 7 years ago by <automation>

Owner: Jean-Paul Calderone deleted
Note: See TracTickets for help on using tickets.