Opened 11 years ago

Closed 8 years ago

#1997 defect closed fixed (fixed)

perhaps wakeUp could be slightly simpler

Reported by: wingo Owned by: Jean-Paul Calderone
Priority: lowest Milestone:
Component: core Keywords:
Cc: spiv, jknight, Jean-Paul Calderone, Glyph Branch: branches/fix-wakeup-1997
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun

Description (last modified by Glyph)

The test for what thread we're in is unnecessary, and slightly worsens a race with select() because the handler could be called after processing events in the queue but before entering select().

We're leaving it this way for now, because the race is easier to trigger in its current configuration (and still exists in other configurations), but will be removing this redundant and unnecessary code either after or during the fix for the real underlying issue.

See #791.

Attachments (1)

nocalllater.diff (679 bytes) - added by Glyph 11 years ago.
my suggested fix, hopefully incredibly simple

Download all attachments as: .zip

Change History (22)

comment:1 Changed 11 years ago by spiv

Cc: spiv added

comment:2 Changed 11 years ago by Glyph

Probably a good idea. The waker also needs unit tests.

comment:3 Changed 11 years ago by jknight

Huh "oops". Indeed, that isInIOThread shouldn't be there. I saw this bug in passing quite a while ago, and had a false memory that I'd already fixed it. :(

comment:4 Changed 11 years ago by Jean-Paul Calderone

Suggestions for a unit test for this behavior?

comment:5 Changed 11 years ago by Jean-Paul Calderone

Here's a suggested fix for the problem. Still no ideas for unit testing, but perhaps this suggestion will lend itself to some ideas.

Get rid of sigInt, sigBreak, sigTerm and _handleSigchld. Add a single signal handler utility method that conceptually does nothing more than:

signals.append('name of signal')

The reactor I/O thread will check signals and dispatch to registered handlers as necessary.

comment:6 in reply to:  1 Changed 11 years ago by Glyph

A simpler fix would simply be to remove the callLater branch entirely. The only reason that was there was that previously the waker pipe was blocking; it was long ago made non-blocking for other reasons.

I don't see any reasonable way to unit-test this behavior, since it is completely asynchronous and may occur at any time. However, sigBreak, sigTerm and _handleSigchld are definitely ugly and could be removed and replaced with one function, perhaps by a separate ticket... it really seems like we ought to be able to do something else by centralizing signal handling but I can't think of what.

Changed 11 years ago by Glyph

Attachment: nocalllater.diff added

my suggested fix, hopefully incredibly simple

comment:7 Changed 11 years ago by jknight

Cc: jknight Jean-Paul Calderone Glyph added

Look, I've no idea what ya'll are going on about, but the problem is in wakeUp itself, as I thought had been made clear above. Below patch should be applied.

Index: twisted/internet/
--- twisted/internet/    (revision 17708)
+++ twisted/internet/    (working copy)
@@ -296,11 +296,10 @@
     def wakeUp(self):
         """Wake up the event loop."""
-        if not threadable.isInIOThread():
-            if self.waker:
-                self.waker.wakeUp()
-            # if the waker isn't installed, the reactor isn't running, and
-            # therefore doesn't need to be woken up
+        if self.waker:
+            self.waker.wakeUp()
+        # if the waker isn't installed, the reactor isn't running, and
+        # therefore doesn't need to be woken up
     def doIteration(self, delay):
         """Do one iteration over the readers and writers we know about."""

comment:8 Changed 11 years ago by Jean-Paul Calderone

Yes, that patch probably fixes the problem. I was suggesting simplifying the entire signal handling implementation, since it is basically impossible to unit test. As far as I can tell, your patch is a necessary part of the solution I was suggesting.

comment:9 Changed 11 years ago by Glyph

Oh, yes. A necessary part of mine as well. Without the change that I suggested, the waker won't even be invoked in the SIGCHLD case (although your fix will fix _other_ signal-handling race conditions).

comment:10 Changed 11 years ago by Glyph

Whoops. I'm totally wrong.

The patch I posted is probably a good idea (along with other changes that would make it actually work in a thread-less environment) but is a separate issue entirely. Thread support has nothing to do with whether you are running in a thread or not.

For the normal case James's patch is adequate. I think we should apply it, because I still can't think of a way to test it.

comment:11 Changed 11 years ago by Jean-Paul Calderone

Quotient's unit tests semi-reliably produce some kind of process reaping bug. FWIW, applying James' patch above didn't fix it. This might be a trial bug rather than a real reactor bug, but I can't see how. I don't understand why the patch wouldn't have fixed it otherwise.

comment:12 Changed 10 years ago by Glyph

Resolution: wontfix
Status: newclosed

The bug is more insidious than we had previously thought.

There is an inherent race in Python here, perhaps an unfixable one. Everywhere from the last trip through the ceval loop to the start of the actual 'select' syscall, a signal may arrive, be recognized by Python's signal handler, filed for later execution, and then select runs with its given timeout (probably None), which blocks forever.

JP verified this by inserting a raise(SIGINT) into the line before the actual select call in selectmodule.c. A Python SIGINT handler installed with that modified select module would only run _after_ the select call.

The only potentially correct solution in Python is described in #791. We may add an optimization in C later, which would register a syscall that would directly write to the waker's file descriptor, without waiting for Python code to execute, thus obviating the need for the looping call.

comment:13 Changed 10 years ago by jknight

Resolution: wontfix
Status: closedreopened

Reopening, as this fix should be made anyways.

comment:14 Changed 10 years ago by Glyph

Priority: normallowest
Summary: signals handled via callFromThread, yet the reactor still drops into select()perhaps SIGCHLD handler could be slightly simpler

I think the fixes in #791 will probably obsolete this fix, and it doesn't actually "fix" anything, it just makes the race infinitesimally less race-y. The thread-support thing is also silly, and should also be applied (along with other changes) purely in the interests of code cleanliness, but it also isn't a fix for any real issue.

comment:15 Changed 10 years ago by Glyph

Description: modified (diff)
Owner: Glyph deleted
Status: reopenednew
Summary: perhaps SIGCHLD handler could be slightly simplerperhaps wakeUp could be slightly simpler

Changing the description too, since apparently we're repurposing this ticket for a minor change rather than the actual fix:

The original description (thanks trac) was:

reactor.iterate() is defined like this:

    def iterate(self, delay=0):
       """See twisted.internet.interfaces.IReactorCore.iterate.

runUntilCurrent runs all the procs on the threadCallQueue. So if
something is added to the threadCallQueue between runUntilCurrent()
and doIteration(), the reactor needs to have an fd ready for reading
to shortcut the select(). This is done by callFromThread() calling
reactor.wakeUp(), which will write on the wakeup FD.

HOWEVER. For some reason reactor.wakeUp() only writes on the fd if it
is being called from another thread. This is obviously borked in the
signal-handling case, when a signal arrives between runUntilCurrent()
and doIteration(), and is processed via reactor.callFromThread(), as
is the case with SIGCHLD.

Solution: have reactor.wakeUp() always wake the waker, regardless of
the thread it comes from.

The solution is incorrect, for reasons given above.

comment:16 Changed 9 years ago by Jean-Paul Calderone

In addition to #791, see #733 and #2535.

comment:17 Changed 9 years ago by Glyph

Owner: set to Glyph
Status: newassigned

comment:18 Changed 8 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/fix-wakeup-1997

(In [25654]) Branching to 'fix-wakeup-1997'

comment:19 Changed 8 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Glyph deleted
Status: assignednew

Put the fix james recommended into the branch, plus sort of kind of a unit test maybe a bit.

comment:20 Changed 8 years ago by washort

Keywords: review removed
Owner: set to Jean-Paul Calderone

well, the test fails without the change, and passes with it, so I'd call that a success :) Looks good to merge.

comment:21 Changed 8 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [25693]) Merge fix-wakeup-1997

Author: exarkun Reviewer: washort Fixes: #1997

Change the reactor wake up method so that it tries to wake up the reactor no matter what thread it is called from. This avoids a potential deadlock where the wake up method is called from a signal handler in the main thread after the thread call queue has been checked but before the event notification API (ie select, poll, etc) has been entered.

Note: See TracTickets for help on using tickets.