Opened 8 years ago

Closed 5 years ago

#2259 defect closed fixed (fixed)

stdio fails when output redirected to a file

Reported by: jknight Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: spiv Branch: branches/stdio-file-redirect-2259
(diff, github, buildbot, log)
Author: jknight, exarkun Launchpad Bug:

Description

At least on linux. Behavior of other OSes may or may not vary. See attached test case.

It works as expected when running normally. However, when redirecting output to a file, connectionLost gets called immediately.

This is because os.read() throws OSError so enableReadHack=True, but then select always returns readable&writeable for files.

Fixing this properly requires not using that read hack, which means tracking in-use sockets separately from the readable/writable lists (aka #365 and #1662), so we can get close notification.

Attachments (1)

testcase.py (1002 bytes) - added by jknight 8 years ago.

Download all attachments as: .zip

Change History (31)

Changed 8 years ago by jknight

comment:1 Changed 8 years ago by spiv

  • Cc spiv added

comment:2 Changed 8 years ago by glyph

  • Owner changed from glyph to jknight

comment:3 Changed 8 years ago by jknight

Actually this is fixable by itself, possibly, by putting a:

if stat.S_ISFIFO(os.fstat(self.fileno()).st_mode):

around the bit which does enableReadHack.

comment:4 Changed 7 years ago by therve

#2636 was closed as a duplicate of this.

comment:5 Changed 5 years ago by exarkun

(In [28892]) Add a test for StandardIO when stdout is redirected to a file; apply foom's suggested (probably) fix

refs #2259

comment:6 Changed 5 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/stdio-file-redirect-2259

(In [28891]) Branching to 'stdio-file-redirect-2259'

comment:7 Changed 5 years ago by exarkun

  • Author changed from exarkun to jknight, exarkun
  • Keywords review added
  • Owner jknight deleted

comment:8 Changed 5 years ago by itamar

  • Keywords review removed
  • Owner set to exarkun

This review doesn't quite cover the merits of the fix, since I don't understand it, but see below:

  1. Missing news fragment.
  2. The test doesn't actually write to the file. It would be good if that test or an additional one actually wrote something and the contents were verified.
  3. Reading through the existing code confuses me. In particular, after a ProcessWriter.writeSomeData(), if all the data was successfully written and the read hack is enabled you will start reading, and doRead will then cause CONNECTION_LOST *unconditionally*(??!). That doesn't make sense to me at all, and sounds like the cause of the bug, but this patch doesn't seem to address the issue. It seems like, from comments elsewhere in the code, that the "read hack" is attempting to read to see if a connection was lost, or something, but that's never done, it's only done if we have broken version of Linux, but the read hack (read() to check closeness) should probably be happening in all cases since its goal is to work around limitations of select() reactor. Or something.
  4. I find this code very mystifying in general, so even if my interpretation is wrong I would suggest better docstrings may be in order.

comment:9 Changed 5 years ago by exarkun

(In [28893]) Write some bytes to stdout to make sure it's really still open

refs #2259

comment:10 Changed 5 years ago by exarkun

(In [28895]) Try to make the fifo checking code clearer

refs #2259

comment:11 Changed 5 years ago by exarkun

(In [28896]) Document enableReadHack and add a comment where it's used in writeSomoeData

refs #2259

comment:12 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Thanks for the review!

  1. Missing news fragment.

Added.

  1. The test doesn't actually write to the file.

Added.

  1. Reading through the existing code confuses me.

Not too surprising. :)

In particular, after a ProcessWriter.writeSomeData(), if all the data was successfully written and the read hack is enabled you will start reading

So, for one thing, the read hack won't be enabled in the case being fixed here. That's what the new S_ISFIFO check the branch adds is for: to make sure to disable the read hack. So, this case won't trigger startReading.

and doRead will then cause CONNECTION_LOST *unconditionally*(??!).

It would, if the read hack were enabled. But since it's disabled, instead it just calls stopReading. It is still possible to get into doRead even with this change, because _posixstdio.StandardIO.__init__ calls startReading on the ProcessWriter it creates. I'm not exactly sure why it does this, and it might be redundant. I want to leave it alone for this fix, though, since it doesn't appear to hurt.

That doesn't make sense to me at all, and sounds like the cause of the bug, but this patch doesn't seem to address the issue.

That's sort of right. But the patch addresses the issue by changing enableReadHack from True to False for the affected case, rather than trying to change the meaning of enableReadHack.

but the read hack (read() to check closeness) should probably be happening in all cases since its goal is to work around limitations of select() reactor

All of the reactors presently have the limitation, because they (still) don't track/report readability separate from close. So the read hack is necessary on all of them for certain file descriptors (not file descriptors for normal files though). But the select call is only necessary on broken versions of linux where readability on a pipe might not mean the pipe is closed. Note that doRead returns None if enableReadHack is True, brokenLinuxPipeBehavior is True, but the fd is not reported as readable and writeable by select.

  1. I find this code very mystifying in general, so even if my interpretation is wrong I would suggest better docstrings may be in order.

I added some more comments and docstrings and re-arranged the way enableReadHack is initialized. Perhaps that makes things clearer.

comment:13 Changed 5 years ago by exarkun

(In [28898]) Give MockOS an fstat function

refs #2259

comment:14 Changed 5 years ago by itamar

What was confusing me was that I'd blanked on the fact that if doRead is called that means the fd is readable, so it's not an unconditional connection lost, it's only lost when fd is readable.

comment:15 Changed 5 years ago by exarkun

  • Keywords review removed
  • Owner set to exarkun

There are some issues with epoll and on Windows.

comment:16 Changed 5 years ago by exarkun

Well, this approach totally fails for epollreactor. epoll cannot be asked to monitor normal files. It won't even lie like select() and poll(), it just fails.

I think that means this needs to be fixed by not using ProcessWriter at all in StandardIO if the stdout fd is a regular file. That would mean ProcessWriter doesn't actually need to change at all.

comment:17 Changed 5 years ago by exarkun

Considering this further, the epoll behavior in this branch isn't a regression. trunk is as broken for epoll+normal-file-stdout. So that's not necessarily a reason to block applying this solution (though of course the new tests would need to be marked todo for epoll reactor, and there should be a new ticket for making this work with epoll).

comment:18 Changed 5 years ago by exarkun

(In [28907]) todo the new test on epollreactor

refs #2259

comment:19 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Alright, Windows and epollreactor now skip the new test. If this seems like a reasonable course, I'll file a new ticket for making this work on epoll and reference it from the skip.

comment:20 Changed 5 years ago by itamar

If using a thread for stdout would make it work everywhere (and also solve issues when talking to programs that don't expect their stdin to be non-blocking), maybe we should just do that and fix everything in one fell swoop?

comment:21 Changed 5 years ago by glyph

It's never as easy as "spawn a thread and write 'till you block" :). If we use a thread for stdout, we need to make sure that it's still possible to cleanly shut down without crashing python or getting into any race conditions or whatever, even if the FD being used for stdout starts off blocking and never recovers. It also adds some overhead if you've got a server with a bunch of subprocesses that it needs to talk to over stdin/stdout.

That could still be the right thing to do, of course, but it would be nice to avoid it in the cases where we could.

comment:22 Changed 5 years ago by itamar

Threads are easy like Monday morning, but OK. What's the epoll-specific solution, if any?

comment:23 follow-up: Changed 5 years ago by jknight

Don't forget about #3442...

comment:24 Changed 5 years ago by exarkun

If someone wants to implement the thread-based solution, I'm not going to stand in their way. I'm not likely to implement it in the near future. I'm interested in having this fixed in the selectreactor case because it's breaking a real application. That application handles three file descriptors, so it doesn't really need epoll.

comment:25 in reply to: ↑ 23 Changed 5 years ago by glyph

Replying to jknight:

Don't forget about #3442...

Well, I'm not forgetting about it, but I sure wish I could.

comment:26 Changed 5 years ago by itamar

  • Keywords review removed
  • Owner set to exarkun
  1. You should open ticket for epoll and then update the #NNNN reference in the skip message for the test. Also maybe update news fragment to mention epoll is still issue.
  1. Is there any way we can detect epoll when creating the StandardIO class, and throwing exception to the user so they don't get really weird result and have to track the bug down? If you think fixing epoll soon is realistic that may be unnecessary.

Or maybe we can go straight to fixing 3442 before next release ;)

Other than that, I guess it's done.

comment:27 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted
  1. You should open ticket for epoll and then update the #NNNN reference in the skip message for the test. Also maybe update news fragment to mention epoll is still issue.

Done.

  1. Is there any way we can detect epoll when creating the StandardIO class, and throwing exception to the user so they don't get really weird result and have to track the bug down?

Yea-ish. It raises a RuntimeError with a message and a ticket reference now.

comment:28 Changed 5 years ago by itamar

  • Keywords review removed
  • Owner set to exarkun

EPERM could in theory be caused by some other reactor with some other arbitrary limitation; I suggest slightly expanding the RuntimeError message, to say something like "The current reactor does not support this type of file descriptor, e.g. epollreactor doesn't support etc.".

Other than that, looks good, please commit.

comment:29 Changed 5 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [28927]) Merge stdio-file-redirect-2259

Author: jknight, exarkun
Reviewer: itamar
Fixes: #2259

Support stdout being redirected to a normal file when using twisted.internet.stdio.

comment:30 Changed 4 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.