Opened 13 years ago

Closed 7 years ago

#733 defect closed fixed (fixed)

twisted's SIGCHLD handler breaks popen.

Reported by: jknight Owned by:
Priority: high Milestone:
Component: core Keywords:
Cc: Glyph, Jean-Paul Calderone, spiv, itamarst, jknight, titty, Thijs Triemstra Branch: branches/popen-compatibility-733-6
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun

Description (last modified by Jean-Paul Calderone)

Highest priority is for regressions and reviews.

Change History (43)

comment:1 Changed 13 years ago by jknight

This is a simple way to demonstrate:
>>> from twisted.internet import reactor
>>> reactor._handleSignals()
>>> f=os.popen("sleep 5; echo 'Foo'"); f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IOError: [Errno 4] Interrupted system call

The solution is easy: our SIGCHLD handler needs to be given a SA_RESTART flag. This will cause it to 
not be handled until after the read call returns, as python does not allow the python sighandler code to 
be run in the interrupt context. But that's okay.

There's only one minor problem: the python signal() wrapper doesn't provide the flags parameter that's 
necessary in order to specify SA_RESTART. So, to fix this, someone should write a C extension module 
that does something like this:

void add_signalflags(int sig, int moreflags)
{
  int res;
  struct sigaction act;
  res = sigaction(sig, NULL, &act);
  if (res < 0)
    RAISE ERROR
  
  act.sa_flags |= moreflags;
  res = sigaction(sig, &act, NULL);
  if (res < 0)
    RAISE ERROR
}

comment:2 Changed 13 years ago by Glyph

OK, no useful feedback, but AWESOME for figuring this out, james.  Thanks.

comment:3 Changed 13 years ago by Glyph

OK wait no, I changed my mind.  Will passing this flag still (properly)
interrupt select()?

comment:4 Changed 13 years ago by jknight

According to my documentation it will.
     If a signal is caught during the system calls listed below, the call may
     be forced to terminate with the error EINTR, the call may return with a
     data transfer shorter than requested, or the call may be restarted.
     Restart of pending calls is requested by setting the SA_RESTART bit in
     sa_flags.  The affected system calls include open(2), read(2), write(2),
     sendto(2), recvfrom(2), sendmsg(2) and recvmsg(2) on a communications
     channel or a slow device (such as a terminal, but not a regular file) and
     during a wait(2) or ioctl(2).  However, calls that have already committed
     are not restarted, but instead return a partial success (for example, a
     short read count).

May want to make *sure* by testing, though.

comment:5 Changed 13 years ago by itamarst

On Redhat 7.3 I had a hard time reproducing this, but:

>>> signal.alarm(1); f=os.popen("sleep 5; echo 'Foo'"); f.read()

reproduced it some of the time.

comment:6 Changed 11 years ago by ColinAlston

Component: core
Milestone: Twisted-2.5
Priority: highhighest

Reproduced thanks to Gentoo Emerge and its crappy signaling

avail = os.popen('emerge -p --nocolor --nospinner %s | grep %s' % (pack, pack)) k = avail.read()

Something like that, put it anywhere you please, and you will intermitantly generate this bug on Twisted 2.0.1

comment:7 Changed 11 years ago by spiv

moshez made an interesting suggestion on IRC -- I wonder if we can implement the missing add_signalflags function with ctypes?

comment:8 Changed 11 years ago by Jean-Paul Calderone

Description: modified (diff)
Milestone: Twisted-2.5
Priority: highesthigh

comment:9 Changed 11 years ago by Jean-Paul Calderone

See also #791 and #1997

comment:10 Changed 11 years ago by titty

Cc: titty added

Feel free to use: http://systemexit.de/repo/sigalter It's a c module, which provides a set_SA_RESTART(SIGNUM) method.

Here's the code (in case the above server is unreachable):

#include <Python.h>
#include <signal.h>

static PyObject *
set_SA_RESTART (void *self, PyObject *args)
{
        struct sigaction sa;

        int signum;
        if (!PyArg_ParseTuple(args, "i:set_SA_RESTART", &signum)) {
                return NULL;
        }

        if (sigaction(signum, NULL, &sa)==-1) {
                goto error;
        }

        if (!(sa.sa_flags & SA_RESTART)) {
                sa.sa_flags |= SA_RESTART;
                if (sigaction(signum, &sa, NULL)==-1) {
                        goto error;
                }
        }

        return Py_BuildValue("");

error:
        PyErr_SetFromErrno(PyExc_OSError);
        return NULL;
}

static PyMethodDef module_function[] = {
        {"set_SA_RESTART", (PyCFunction)set_SA_RESTART, METH_VARARGS, 
         "set_SA_RESTART(SIGNUM):\n   sets SA_RESTART flag on the signalhandler for signal SIGNUM"},
        { NULL,    NULL }
};


static char module_doc[] = "provide funtions to alter signal handling";

DL_EXPORT(void) initsigalter(void)
{
        Py_InitModule3("sigalter", module_function, module_doc);
}

comment:11 Changed 11 years ago by itamarst

Owner: changed from itamarst to jknight

comment:12 Changed 10 years ago by titty

the ctypes version (for linux):

import ctypes
import signal
libc = ctypes.CDLL("libc.so.6")
libc.siginterrupt(signal.SIGCHLD, 0)

comment:13 Changed 10 years ago by Jean-Paul Calderone

#791, #1997, and #2535 have significant overlap with this. Fixing any one of these (except for #2535, perhaps) almost certainly means fixing all of them.

The correct fix for most of them is:

  • Write a SIGCHLD handler in C. All it needs to do is write to an fd.
  • Install the SIGCHLD handler with the SA_RESTART flag. This makes it run when a child exits, but doesn't cause EINTR.
  • Include the fd in the reactor so that whenever it is written to, the reactor will wake up and reap dead children.

With this solution:

  • No more EINTR
  • No more race condition inside ceval.c

However, a custom SIGCHLD handler is still required, so #2535 isn't resolved.

comment:14 Changed 9 years ago by titty

http://docs.python.org/dev/library/signal.html:

python 2.6 will have signal.siginterrupt and signal.set_wakeup_fd.

comment:15 Changed 9 years ago by Glyph

Owner: changed from jknight to Glyph
Status: newassigned

comment:16 in reply to:  14 Changed 9 years ago by Thijs Triemstra

Cc: Thijs Triemstra added

Replying to titty:

http://docs.python.org/dev/library/signal.html:

python 2.6 will have signal.siginterrupt and signal.set_wakeup_fd.

Now a Python 2.6 buildslave available http://buildbot.twistedmatrix.com/builders/ubuntu64-python2.6-select/

comment:17 Changed 9 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/popen-compatibility-733

(In [25649]) Branching to 'popen-compatibility-733'

comment:18 Changed 9 years ago by Glyph

Should I kill off the branches for #791? It looks like the code useful for addressing this problem has all been inherited by this branch.

comment:19 Changed 9 years ago by Glyph

Owner: changed from Glyph to Jean-Paul Calderone
Status: assignednew

Also if you're working on a ticket, please accept it (or at least reassign it to yourself).

comment:20 Changed 9 years ago by Jean-Paul Calderone

Deleting branches is so boring. I think I may have gotten all the good bits of the #791 branch at this point, so feel free to delete it if you like. Someday I'll probably write a program that deletes branches which are associated with tickets which have been closed for a while. Until then I don't plan to think about it much.

comment:21 Changed 9 years ago by Jean-Paul Calderone

Branch: branches/popen-compatibility-733branches/popen-compatibility-733-2

(In [26139]) Branching to 'popen-compatibility-733-2'

comment:22 Changed 8 years ago by Glyph

This is weird.

Under 2.5, using many different variations of James's example, I can no longer reproduce this problem. It still raises an exception on 2.5 though.

comment:23 Changed 8 years ago by Glyph

Ugh, hit "submit" too early.

Under python 2.5 and 2.6, and Twisted trunk, using many different variations of James's example, I can no longer reproduce this problem. It still raises an exception on python 2.4 though.

I can't see what's changed. PyOS_setsig seems to be the same, at least in 2.5.

comment:24 Changed 8 years ago by jknight

So what's left to do on that branch?

From a cusory review, I'd say:

  • it needs a test.
  • It shouldn't use the custom _sigchld.c module under python 2.6.

comment:25 Changed 8 years ago by Thijs Triemstra

Not able to reproduce this either on osx 10.5.8:

Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> from twisted.internet import reactor
>>> f = os.popen("sleep 5; echo 'Foo'"); f.read()
'Foo\n'

comment:26 Changed 8 years ago by Jean-Paul Calderone

It's a real issue.

comment:27 Changed 8 years ago by Jean-Paul Calderone

Branch: branches/popen-compatibility-733-2branches/popen-compatibility-733-3

(In [27731]) Branching to 'popen-compatibility-733-3'

comment:28 Changed 8 years ago by Jean-Paul Calderone

Branch: branches/popen-compatibility-733-3branches/popen-compatibility-733-4

(In [28250]) Branching to 'popen-compatibility-733-4'

comment:29 Changed 7 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Okay!

comment:30 Changed 7 years ago by David Reid

Keywords: review removed
Owner: set to Jean-Paul Calderone
  • Lots of undefined references (#NNNN)
  • twisted/internet/test/test_baseprocess.py has it's copyright line updated but no changes
  • There isn't a test showing that twisted.internet.posixbase._UnixWaker.wakeUp raises when the write fails with something other than EAGAIN (It's in the diff but only because the classes position shifted in the file)
  • Possible typo in PosixReactorBase mopde = self._checkMode('IReactorUNIXDatagram.connectUNIXDatagram', mode)

comment:31 Changed 7 years ago by Jean-Paul Calderone

Branch: branches/popen-compatibility-733-4branches/popen-compatibility-733-5

(In [28302]) Branching to 'popen-compatibility-733-5'

comment:32 Changed 7 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Thanks. Fixed those points. Please review.

comment:33 Changed 7 years ago by Glyph

Owner: set to Glyph
Status: newassigned

comment:34 Changed 7 years ago by Glyph

Keywords: review removed
Owner: changed from Glyph to Jean-Paul Calderone
Status: assignednew

So, uh... oops? I wanted to see a all-supported-builders build report for this branch, so I did this:

glyph@miranda:~/Projects/Twisted/sandbox/exarkun$ python force-builds.py 
Traceback (most recent call last):
  File "force-builds.py", line 10, in <module>
    from twisted.internet import reactor, defer
  File "/Domicile/glyph/Projects/Twisted/trunk/twisted/internet/reactor.py", line 37, in <module>
    from twisted.internet import selectreactor
  File "/Domicile/glyph/Projects/Twisted/trunk/twisted/internet/selectreactor.py", line 21, in <module>
    from twisted.internet import posixbase
  File "/Domicile/glyph/Projects/Twisted/trunk/twisted/internet/posixbase.py", line 181, in <module>
    _Waker = _PipeWaker
NameError: name '_PipeWaker' is not defined
[Error: 1]

but I think that's the review by itself.

comment:35 Changed 7 years ago by Jean-Paul Calderone

(In [28362]) Update to use the new name

refs #733

comment:36 Changed 7 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Sigh, forgot about that. Fixed now.

comment:37 Changed 7 years ago by terrycojones

The example shown at the top of the ticket works for me. trial twisted reports no errors.

comment:38 Changed 7 years ago by Glyph

Owner: set to Glyph
Status: newassigned

comment:39 Changed 7 years ago by Jean-Paul Calderone

Branch: branches/popen-compatibility-733-5branches/popen-compatibility-733-6

(In [28435]) Branching to 'popen-compatibility-733-6'

comment:40 Changed 7 years ago by Jean-Paul Calderone

(In [28445]) Quiet the error about the result of write being ignored

refs #733

comment:41 Changed 7 years ago by Glyph

Keywords: review removed
Owner: changed from Glyph to Jean-Paul Calderone
Status: assignednew

Yay for C code!

It appears that the feedback from other reviewers has been adequately dealt with. I can't find anything else to complain about.

Merge, and rejoice. (In that order.)

comment:42 Changed 7 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [28447]) Merge popen-compatibility-733-6

Author: glyph, itamar, exarkun, (perhaps others sorry if I forgot you) Reviewer: glyph, dreid, jknight Fixes: #733

Change the reactors to, on POSIX, use signal.set_wakeup_fd and signal.siginterrupt to support child processes without creating the possibility for causing syscalls to fail with EINTR. On Python 2.5 and older, where these APIs are not available, a new extension module is added to implement the functionality. If neither of these is available, revert to the old behavior. Also Glib2/Gtk2 reactor retain the old behavior due to implementation difficulties.

comment:43 Changed 7 years ago by <automation>

Owner: Jean-Paul Calderone deleted
Note: See TracTickets for help on using tickets.