Opened 8 years ago

Closed 8 years ago

#2371 defect closed fixed (fixed)

twisted.conch.stdio.runWithProtocol writes to non-blocking file descriptor incorrectly

Reported by: exarkun Owned by:
Priority: highest Milestone:
Component: conch Keywords:
Cc: therve Branch:
Author: Launchpad Bug:

Description

At this line:

        os.write(0, "\r\x1bc\r")

0 is in non-blocking mode. This means the write may fail or be incomplete. The return value must be checked or the file descriptor should be set to be blocking.

This may be related to #2123. Or it may not be. Whether the file descriptor is in blocking mode or non-blocking mode, the write succeeds completely for that test; however, when it is in blocking mode, I have not seen the test fail, whereas when it is in non-blocking mode it fails at least 1 time in 10 when run on a heavily loaded OS X machine.

Change History (15)

comment:1 Changed 8 years ago by exarkun

I also noticed that the change I thought I had committed a while ago still isn't actually in Twisted:

  • twisted/conch/stdio.py

     
    5252            self.proto = None 
    5353 
    5454class ConsoleManhole(ColoredManhole): 
    55     def handle_QUIT(self): 
    56         ColoredManhole.handle_QUIT(self) 
     55    def connectionLost(self, reason): 
    5756        reactor.stop() 
    5857 
    5958def runWithProtocol(klass): 

comment:2 Changed 8 years ago by exarkun

  • Keywords review added
  • Owner z3p deleted
  • Priority changed from normal to highest

I believe this is fixed in ctrl+\-2371+2123.

As the branch name suggests, this is related to #2123. The three changes in this branch together solve this problem by:

  • not stopping the reactor immediately in response to C-\, but instead waiting for the transport to disconnect and then stopping it. The transport will wait until it has pushed all of its bytes into the kernel before actually disconnecting. This ensures any application-level writes outstanding are processed.
  • setting the output fd to blocking so that the os.write in runWithProtocol will always write the string fully to the kernel.
  • draining the tty after writing the clear screen code before exiting so that the peer is guaranteed to have received it.

Overall, only the first two changes are correct, at best. The third change isn't actually necessary to fix this problem on OS X (only the second change is - but for no reason remotely related to the obvious). Perhaps the tcdrain should be dropped entirely, in fact.

comment:3 Changed 8 years ago by glyph

I don't think I understand the issue quite well enough to give a review.

I believe the code is in the wrong place, however. It ought to be in twisted.internet._posixstdio, or perhaps even a new special case module just for OS X's particular brokennesses.

A reactor shutdown hook or atexit handler both seem more appropriate to me, and they ought to be hooked from the StandardIO handler because there is still an unresolved bug here, I think.

comment:4 Changed 8 years ago by jknight

  • Owner set to jknight
  • Status changed from new to assigned

comment:5 Changed 8 years ago by exarkun

Okay I got rid of the tcdrain and moved the setNonBlocking call into the... process... code.

comment:6 Changed 8 years ago by jknight

  • Owner changed from jknight to exarkun
  • Status changed from assigned to new

Sorry about taking so long...

It looks like there's no test case here. I thought you said you had made a minimal program which showed the same bug, so it'd be good if you could include that here.

Other than that it looks fine.

It would be nice to know the bug can be fixed from the other side (that is, the one doing the reading).

comment:7 Changed 8 years ago by exarkun

  • Keywords review removed

comment:8 Changed 8 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Okay. I've added a simplified version of the failing conch test to twisted.test. It fails about one time in ten on neutron without the fix, and with the fix in a trial -u run I left going overnight, didn't fail after more than 100k iterations.

I took the opportunity to add some test coverage for ProcessProtocol as well, since I ended up fixing an inconsistency between PTY processes and regular processes in the course of writing the above mentioned test.

This also led me to delete SignalMixin, since it has been pointless for a while and was near the code I was changing.

comment:9 Changed 8 years ago by radix

  • Keywords review removed
  • Owner set to exarkun

It looks like test_lastWriteReceived could be way more trivial. I see no reason for it to be interactive (with its subprocess) at all; it seems you should just be able to make the subprocess write a single byte on its stdout, and make sure that you receive that byte in the test.

comment:10 Changed 8 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Okay, I simplified test_lastWriteReceived as far as I could. It still fails non-deterministically, but much less frequently now. I put a comment into the test explaining how to make it fail more frequently.

comment:11 Changed 8 years ago by therve

  • Keywords review removed
  • Owner set to exarkun

A few comments:

  • test_process needs to be merged forward (sorry...)
  • pyflakes complains about test_mail
  • test_stdio needs to be skipped for windows

comment:12 Changed 8 years ago by exarkun

  • Cc therve added
  • Keywords review added
  • Owner exarkun deleted

Done

comment:13 Changed 8 years ago by therve

  • Keywords review removed
  • Owner set to exarkun

I think this is mostly good. This a huge branch so I hope I didn't forget anything:

  • The change of Crypto imports doesn't solve anything, because conch/ssh/keys.py still import it. Not really a problem of its own, though.
  • I don't know the policy of copyright statements. I saw you change to '2001-2004,2007', I don't know if '2001-2007' wouldn't be better. Anyhow it should be done consistently in all files.
  • The new docstring of test_iutils.UtilsTestCase is better than nothing, but it would be even better it it mentioned getProcessOutput.

Please merge when OK on all this.

Thanks for the great work !

comment:14 Changed 8 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [19640]) Merge ctrl+backslash-2371+2123-5

Author: exarkun
Reviewer: jknight, radix, therve
Fixes #2371, #2123, #2378

Remove the long-since obsolete test helper SignalMixin, which set up
and tore down special signal handling state. For some time, trial has
taken care of this detail. (#2378)

Change the process file descriptor "connection lost" code to reverse
the setNonBlocking operation done during initialization, leaving the
file descriptor in the blocking state (what it probably was before it
was given to the process code). This works around a bug with PTYs on
OS X where the last write made to a non-blocking PTY might be lost if
the process exits before the data is read from the other end of the
PTY. This also corrects a (perhaps unreasonable) assumption that
blocking writes may be made to stdout after twisted.internet.stdio
has been used. (#2371)

Change ConsoleManhole so that it waits for connectionLost notification
before stopping the reactor. This avoids the possibility that the
reactor will stop before its transport has had all of its output
flushed. Combined with the fix for #2371, this fixes #2123.

comment:15 Changed 4 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.