Opened 6 years ago

Last modified 2 years ago

#3442 defect new

t.i.stdio must not set nonblocking on stdio FDs

Reported by: jknight Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: exarkun, itamar, jknight, glyph, ezyang Branch:
Author: Launchpad Bug:

Description

It seems that O_NONBLOCK is a property of file descriptions, not of file descriptors. This means that it's incorrect for any code to ever set it on a file descriptor that it did not just freshly create with open(), socket(), etc.

t.i.stdio needs to instead do a workaround like http://cr.yp.to/unix/nonblock.html describes. That is: use select/etc as usual to find when the fd is probably readable/writable. Then, try to read/write it. However, if it turns out select lied, you need a way to escape from the blocking read/write. So, you have to set a timer to send a signal to yourself (e.g. SIGALARM) around the read/write, so that it interrupts the syscall for you after a short period of time, in case select lied.

This is totally sucky but I can't see a correct alternative (other than using threads, like the windows version does).

<foom> yeah, so t.i.stdio is totally broken, it can't set nonblocking on random fds it inherits, because that's a property of the file description not file descriptor.
<exarkun> Okay, that's a property of the file description.  So?
<foom> you run a t.i.stdio app. it sets stdout to nonblocking mode. Now, stdout is nonblocking for all processes that share that same stream.
<exarkun> How many people want to share stdout among multiple processes?
<foom> e.g. you return back to the shell, now the shell's stdout is nonblocking
<exarkun> That'd be easily fixed by making them blocking again before exiting
<foom> unless you don't exit cleanly
<foom> or unless someone else is running concurrently
<exarkun> If you don't exit cleanly, you might leave stdout in an inconsistent state anyway
<exarkun> And concurrency is a different case :)
<foom> you might? how?
<glyph> foom: sys.stdout.write("\x1b")
<foom> at least random other programs won't fail with EAGAIN
<exarkun> I've never wanted to programmatically interact with stdout that was in concurrent use by another process
<exarkun> That's why I said it was an uncommon case
<foom> you've never ran program & in a shell?
<foom> and had it print something?
<exarkun> not when `program´ wanted to play with stdout
<exarkun> well, not that's not true
<exarkun> I have
<exarkun> But whenever the program prints to stdout I get pissed up, foreground it and kill it.
<exarkun> Why would I want a background process talking to my stdio?  It's *background*!  The foreground process is the one that gets to talk to stdio.
<foom> so, in the meantime, since the shell printed something, it set the file back to blocking mode
<exarkun> Anyway, that's irrelevant - it's *possible* to share, I'm not denying that
<exarkun> But I really think it's very uncommon.
<foom> and now your twisted program is blocking on stdio
<foom> for the rest of its life
<foom> well, I just ran into this in real life.
<exarkun> Okay, so you probably want a fix :)
<foom> and it took quite a while to realize what the hell was going on
<exarkun> Before I was thinking that twisted.internet.stdio didn't need to change completely to accomodate an uncommon use-case
<foom> because I didn't even realize nonblockingness was a property of the file description
<exarkun> But now I'm thinking it can accomodate this use-case without changing completely
<exarkun> What if it could dup stdio and make the copies non-blocking?
<exarkun> Is that a fix?
<foom> no
<foom> dup doesn't copy the file description
<foom> it copies the file descriptor
<exarkun> okay
<foom> see, this is really evil.
<exarkun> shared mutable state usually is ;)
<itamar> what do other applications do?
<exarkun> itamar: I bet they break and die.
<exarkun> foom just said what shells do - set the fd back to blocking, but I bet that's not a reliable fix for them
<exarkun> (ie, what if twisted sets it non-blocking in between doing that and doing a read)
<foom> they do that in response to EAGAIN I believe
<foom> so it should work out okay for them. :)
<exarkun> around every read we do, we could lock the fd and set it non-blocking :)
<foom> lock it? how?
<exarkun> mandatory non-posix locking of course
<exarkun> (this is a joke)
<exarkun> bbiaf, coffee
<itamar> http://cr.yp.to/unix/nonblock.html
<foom> man, that sucks, but it sure would work.
<foom> select() would usually give the right answer, if it doesn't, the signal will break you out of the read.
<foom> it shouldn't even be that hard to fix t.i.stdio to do that.
<foom> sucksucksucksuck
<itamar> do we still want to set non-block?
<foom> no
<foom> you can't
<foom> it's basically incorrect to set nonblockingness on any fd you didn't create
<itamar> :(

Change History (9)

comment:2 Changed 6 years ago by glyph

  • Owner changed from glyph to jknight

I don't think that setting an timer is a viable solution to this problem, mainly because itimers aren't available in any currently released version of python, and alarm's resolution makes it unhelpful.

Signals are also have that inherent race condition that everybody's favorite bug talks about. Of course, it's pretty unlikely that you'll schedule an itimer, get swapped out and stop executing, then get scheduled such that your timer executes before you struggle your way on to the read() call and then block forever, but it's still possible.

I think threads are more likely to yield a correct solution. Start two threads: one for reading, one for writing. Use os.read to avoid file object concurrency problems. Leave them running as long as StandardIO is active, and deliver the data to the main thread. You should be able to easily nuke a wedged thread and cleanly exit just by closing the appropriate FD.

(Reassigning to the reporter because while this poses an interesting intellectual challenge for me, I don't think that there's anything we can really do about terminal sharing - and we are talking about terminals here, because what other FD could you possibly find yourself sharing? My comment about '\x1b' was serious. If you don't own output stream, there's nothing you can do to prevent other programs from puking up their guts on your display and putting it into a completely undefined, unknowable state. Want to ask the terminal what state it's in with the vt100 interrogation protocol? Too bad! You reported the position but then some random forked program moved it around before you could use that information. More closely related to this problem: who says your parent didn't set O_NONBLOCK on this FD? It created it, after all. No standard that I can find says that stdio needs to be blocking, it just usually is: our thread-based solution should be prepared to cope with other similarly naive programs setting O_NONBLOCK without breaking. A more realistic solution to the larger problem here would be to write a feature-complete clone of bash and screen and get the world to adopt a Twisted-based shell for everything, then give every spawned process its own PTYs so that they can be independently managed. The underlying I/O model is just broken and there's pretty much nothing anyone can do about it.)

comment:3 Changed 6 years ago by jknight

I agree that threads are the only way to do it. However, closing the fd is not a viable solution. 1) it doesn't appear to interrupt the syscall in progress, 2) it's really bad to do that in a multithreaded program where another thread could come along and open a new file with that fd number.

This demo program seems to demonstrate a method that might work to shut down the threads. The choice of signal is irrelevant, as long as it has a handler, and *isn't* SA_RESTART.

import threading, os, time,signal, ctypes, errno

signal.signal(signal.SIGCHLD, lambda *args: 1)
running = True

def t_func(fd):
    global thread_id
    thread_id = ctypes.pythonapi.pthread_self()
    print "started"
    data = ''
    try:
        while running:
            print os.read(fd, 100)
    except OSError, e:
        if e.errno != errno.EINTR:
            raise
    print "ended"

t = threading.Thread(target=lambda : t_func(0))
t.start()

time.sleep(1)
running = False
ctypes.pythonapi.pthread_kill(thread_id, signal.SIGCHLD)
print "killed"
t.join()
print "joined"

comment:4 Changed 6 years ago by jknight

PS: the scenario I ran into this had nothing at all to do with terminals. It's in a test suite.

There's a program which basically all it does is to send stdio to and from a socket. It uses nonblocking IO for this.
There's a server, which this program is connecting to, which writes some stuff to stdout.

The first program (in this instance) gets run with stdin of a dedicated pipe, and stdout unchanged, so, pointing to the testrunner's log, which, just so happens to be a pipe. The output of this command is empty, or nearly so, the command is being run for its side-effects. But stdin and stdout both get set non-blocking.

The server then logs some data to stdout, and pukes its guts out with an unexpected EINTR.

I'm pretty sure the opposite problem was happening to me too: that in some case, the streams were getting set back to blocking out from under the nbio-expecting process, thus causing it to hang. But that's just a guess; I don't have a strace log of that happening.

This has just about nothing to do with terminals, except that it breaks terminals *even worse* than other things, as a terminal shares this flag between stdin, stdout, and stderr. So a t.i.stdio-using program which ever spawns a shell as a subprocess, sharing *any* of the streams (e.g. just stderr, as might be common...), is also totally broken. Because the shell will set the terminal (all of stdin, stdout, and stderr) back to blocking.

comment:5 Changed 5 years ago by ezyang

  • Cc ezyang added

Got bit by this bug. CC'ed.

comment:6 Changed 4 years ago by itamar

So we don't forget: while ticket #2259 is fixed, epoll reactor still doesn't support redirecting stdout or stderr to a file. Fixing this bug would solve that.

comment:7 Changed 4 years ago by exarkun

Fixing this bug would solve that.

If the fix takes the form of never putting stdio file descriptors into the reactor, at least. There might be other ways to fix it.

Also see #4429 for the remaining epollreactor issue.

comment:8 Changed 3 years ago by <automation>

  • Owner jknight deleted

comment:9 Changed 2 years ago by itamar

sendmsg()/recvmsg() with MSG_DONTWAIT might be a possible solution for this.

Note: See TracTickets for help on using tickets.