Opened 10 years ago

Closed 5 years ago

#733 defect closed fixed (fixed)

twisted's SIGCHLD handler breaks popen.

Reported by: jknight Owned by:
Priority: high Milestone:
Component: core Keywords:
Cc: glyph, exarkun, spiv, itamarst, jknight, titty, thijs Branch: branches/popen-compatibility-733-6
(diff, github, buildbot, log)
Author: exarkun Launchpad Bug:

Description (last modified by exarkun)

Highest priority is for regressions and reviews.

Change History (43)

comment:1 Changed 10 years ago by jknight

This is a simple way to demonstrate:
>>> from twisted.internet import reactor
>>> reactor._handleSignals()
>>> f=os.popen("sleep 5; echo 'Foo'"); f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IOError: [Errno 4] Interrupted system call

The solution is easy: our SIGCHLD handler needs to be given a SA_RESTART flag. This will cause it to 
not be handled until after the read call returns, as python does not allow the python sighandler code to 
be run in the interrupt context. But that's okay.

There's only one minor problem: the python signal() wrapper doesn't provide the flags parameter that's 
necessary in order to specify SA_RESTART. So, to fix this, someone should write a C extension module 
that does something like this:

void add_signalflags(int sig, int moreflags)
{
  int res;
  struct sigaction act;
  res = sigaction(sig, NULL, &act);
  if (res < 0)
    RAISE ERROR
  
  act.sa_flags |= moreflags;
  res = sigaction(sig, &act, NULL);
  if (res < 0)
    RAISE ERROR
}

comment:2 Changed 10 years ago by glyph

OK, no useful feedback, but AWESOME for figuring this out, james.  Thanks.

comment:3 Changed 10 years ago by glyph

OK wait no, I changed my mind.  Will passing this flag still (properly)
interrupt select()?

comment:4 Changed 10 years ago by jknight

According to my documentation it will.
     If a signal is caught during the system calls listed below, the call may
     be forced to terminate with the error EINTR, the call may return with a
     data transfer shorter than requested, or the call may be restarted.
     Restart of pending calls is requested by setting the SA_RESTART bit in
     sa_flags.  The affected system calls include open(2), read(2), write(2),
     sendto(2), recvfrom(2), sendmsg(2) and recvmsg(2) on a communications
     channel or a slow device (such as a terminal, but not a regular file) and
     during a wait(2) or ioctl(2).  However, calls that have already committed
     are not restarted, but instead return a partial success (for example, a
     short read count).

May want to make *sure* by testing, though.

comment:5 Changed 10 years ago by itamarst

On Redhat 7.3 I had a hard time reproducing this, but:

>>> signal.alarm(1); f=os.popen("sleep 5; echo 'Foo'"); f.read()

reproduced it some of the time.

comment:6 Changed 8 years ago by ColinAlston

  • Component set to core
  • Milestone set to Twisted-2.5
  • Priority changed from high to highest

Reproduced thanks to Gentoo Emerge and its crappy signaling

avail = os.popen('emerge -p --nocolor --nospinner %s | grep %s' % (pack, pack))
k = avail.read()

Something like that, put it anywhere you please, and you will intermitantly generate this bug on Twisted 2.0.1

comment:7 Changed 8 years ago by spiv

moshez made an interesting suggestion on IRC -- I wonder if we can implement the missing add_signalflags function with ctypes?

comment:8 Changed 8 years ago by exarkun

  • Description modified (diff)
  • Milestone Twisted-2.5 deleted
  • Priority changed from highest to high

comment:9 Changed 8 years ago by exarkun

See also #791 and #1997

comment:10 Changed 8 years ago by titty

  • Cc titty added

Feel free to use: http://systemexit.de/repo/sigalter
It's a c module, which provides a set_SA_RESTART(SIGNUM) method.

Here's the code (in case the above server is unreachable):

#include <Python.h>
#include <signal.h>

static PyObject *
set_SA_RESTART (void *self, PyObject *args)
{
        struct sigaction sa;

        int signum;
        if (!PyArg_ParseTuple(args, "i:set_SA_RESTART", &signum)) {
                return NULL;
        }

        if (sigaction(signum, NULL, &sa)==-1) {
                goto error;
        }

        if (!(sa.sa_flags & SA_RESTART)) {
                sa.sa_flags |= SA_RESTART;
                if (sigaction(signum, &sa, NULL)==-1) {
                        goto error;
                }
        }

        return Py_BuildValue("");

error:
        PyErr_SetFromErrno(PyExc_OSError);
        return NULL;
}

static PyMethodDef module_function[] = {
        {"set_SA_RESTART", (PyCFunction)set_SA_RESTART, METH_VARARGS, 
         "set_SA_RESTART(SIGNUM):\n   sets SA_RESTART flag on the signalhandler for signal SIGNUM"},
        { NULL,    NULL }
};


static char module_doc[] = "provide funtions to alter signal handling";

DL_EXPORT(void) initsigalter(void)
{
        Py_InitModule3("sigalter", module_function, module_doc);
}

comment:11 Changed 8 years ago by itamarst

  • Owner changed from itamarst to jknight

comment:12 Changed 7 years ago by titty

the ctypes version (for linux):

import ctypes
import signal
libc = ctypes.CDLL("libc.so.6")
libc.siginterrupt(signal.SIGCHLD, 0)

comment:13 Changed 7 years ago by exarkun

#791, #1997, and #2535 have significant overlap with this. Fixing any one of these (except for #2535, perhaps) almost certainly means fixing all of them.

The correct fix for most of them is:

  • Write a SIGCHLD handler in C. All it needs to do is write to an fd.
  • Install the SIGCHLD handler with the SA_RESTART flag. This makes it run when a child exits, but doesn't cause EINTR.
  • Include the fd in the reactor so that whenever it is written to, the reactor will wake up and reap dead children.

With this solution:

  • No more EINTR
  • No more race condition inside ceval.c

However, a custom SIGCHLD handler is still required, so #2535 isn't resolved.

comment:14 follow-up: Changed 6 years ago by titty

http://docs.python.org/dev/library/signal.html:

python 2.6 will have signal.siginterrupt and signal.set_wakeup_fd.

comment:15 Changed 6 years ago by glyph

  • Owner changed from jknight to glyph
  • Status changed from new to assigned

comment:16 in reply to: ↑ 14 Changed 6 years ago by thijs

  • Cc thijs added

Replying to titty:

http://docs.python.org/dev/library/signal.html:

python 2.6 will have signal.siginterrupt and signal.set_wakeup_fd.

Now a Python 2.6 buildslave available http://buildbot.twistedmatrix.com/builders/ubuntu64-python2.6-select/

comment:17 Changed 6 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/popen-compatibility-733

(In [25649]) Branching to 'popen-compatibility-733'

comment:18 Changed 6 years ago by glyph

Should I kill off the branches for #791? It looks like the code useful for addressing this problem has all been inherited by this branch.

comment:19 Changed 6 years ago by glyph

  • Owner changed from glyph to exarkun
  • Status changed from assigned to new

Also if you're working on a ticket, please accept it (or at least reassign it to yourself).

comment:20 Changed 6 years ago by exarkun

Deleting branches is so boring. I think I may have gotten all the good bits of the #791 branch at this point, so feel free to delete it if you like. Someday I'll probably write a program that deletes branches which are associated with tickets which have been closed for a while. Until then I don't plan to think about it much.

comment:21 Changed 6 years ago by exarkun

  • Branch changed from branches/popen-compatibility-733 to branches/popen-compatibility-733-2

(In [26139]) Branching to 'popen-compatibility-733-2'

comment:22 Changed 5 years ago by glyph

This is weird.

Under 2.5, using many different variations of James's example, I can no longer reproduce this problem. It still raises an exception on 2.5 though.

comment:23 Changed 5 years ago by glyph

Ugh, hit "submit" too early.

Under python 2.5 and 2.6, and Twisted trunk, using many different variations of James's example, I can no longer reproduce this problem. It still raises an exception on python 2.4 though.

I can't see what's changed. PyOS_setsig seems to be the same, at least in 2.5.

comment:24 Changed 5 years ago by jknight

So what's left to do on that branch?

From a cusory review, I'd say:

  • it needs a test.
  • It shouldn't use the custom _sigchld.c module under python 2.6.

comment:25 Changed 5 years ago by thijs

Not able to reproduce this either on osx 10.5.8:

Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> from twisted.internet import reactor
>>> f = os.popen("sleep 5; echo 'Foo'"); f.read()
'Foo\n'

comment:26 Changed 5 years ago by exarkun

It's a real issue.

comment:27 Changed 5 years ago by exarkun

  • Branch changed from branches/popen-compatibility-733-2 to branches/popen-compatibility-733-3

(In [27731]) Branching to 'popen-compatibility-733-3'

comment:28 Changed 5 years ago by exarkun

  • Branch changed from branches/popen-compatibility-733-3 to branches/popen-compatibility-733-4

(In [28250]) Branching to 'popen-compatibility-733-4'

comment:29 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Okay!

comment:30 Changed 5 years ago by dreid

  • Keywords review removed
  • Owner set to exarkun
  • Lots of undefined references (#NNNN)
  • twisted/internet/test/test_baseprocess.py has it's copyright line updated but no changes
  • There isn't a test showing that twisted.internet.posixbase._UnixWaker.wakeUp raises when the write fails with something other than EAGAIN (It's in the diff but only because the classes position shifted in the file)
  • Possible typo in PosixReactorBase mopde = self._checkMode('IReactorUNIXDatagram.connectUNIXDatagram', mode)

comment:31 Changed 5 years ago by exarkun

  • Branch changed from branches/popen-compatibility-733-4 to branches/popen-compatibility-733-5

(In [28302]) Branching to 'popen-compatibility-733-5'

comment:32 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Thanks. Fixed those points. Please review.

comment:33 Changed 5 years ago by glyph

  • Owner set to glyph
  • Status changed from new to assigned

comment:34 Changed 5 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to exarkun
  • Status changed from assigned to new

So, uh... oops? I wanted to see a all-supported-builders build report for this branch, so I did this:

glyph@miranda:~/Projects/Twisted/sandbox/exarkun$ python force-builds.py 
Traceback (most recent call last):
  File "force-builds.py", line 10, in <module>
    from twisted.internet import reactor, defer
  File "/Domicile/glyph/Projects/Twisted/trunk/twisted/internet/reactor.py", line 37, in <module>
    from twisted.internet import selectreactor
  File "/Domicile/glyph/Projects/Twisted/trunk/twisted/internet/selectreactor.py", line 21, in <module>
    from twisted.internet import posixbase
  File "/Domicile/glyph/Projects/Twisted/trunk/twisted/internet/posixbase.py", line 181, in <module>
    _Waker = _PipeWaker
NameError: name '_PipeWaker' is not defined
[Error: 1]

but I think that's the review by itself.

comment:35 Changed 5 years ago by exarkun

(In [28362]) Update to use the new name

refs #733

comment:36 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Sigh, forgot about that. Fixed now.

comment:37 Changed 5 years ago by terrycojones

The example shown at the top of the ticket works for me. trial twisted reports no errors.

comment:38 Changed 5 years ago by glyph

  • Owner set to glyph
  • Status changed from new to assigned

comment:39 Changed 5 years ago by exarkun

  • Branch changed from branches/popen-compatibility-733-5 to branches/popen-compatibility-733-6

(In [28435]) Branching to 'popen-compatibility-733-6'

comment:40 Changed 5 years ago by exarkun

(In [28445]) Quiet the error about the result of write being ignored

refs #733

comment:41 Changed 5 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to exarkun
  • Status changed from assigned to new

Yay for C code!

It appears that the feedback from other reviewers has been adequately dealt with. I can't find anything else to complain about.

Merge, and rejoice. (In that order.)

comment:42 Changed 5 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [28447]) Merge popen-compatibility-733-6

Author: glyph, itamar, exarkun, (perhaps others sorry if I forgot you)
Reviewer: glyph, dreid, jknight
Fixes: #733

Change the reactors to, on POSIX, use signal.set_wakeup_fd
and signal.siginterrupt to support child processes without
creating the possibility for causing syscalls to fail with
EINTR. On Python 2.5 and older, where these APIs are not
available, a new extension module is added to implement the
functionality. If neither of these is available, revert to
the old behavior. Also Glib2/Gtk2 reactor retain the old
behavior due to implementation difficulties.

comment:43 Changed 4 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.