Opened 15 years ago

Closed 2 years ago

#761 defect closed fixed (fixed)

Twisted doesn't exit with proper exit code after SIGINT

Reported by: jknight Owned by: Glyph
Priority: high Milestone:
Component: core Keywords:
Cc: Jean-Paul Calderone, jknight, miracle2k Branch: 761-signal-exit
branch-diff, diff-cov, branch-cov, buildbot
Author: glyph

Description (last modified by Glyph)

I think all the feedback has been addressed here adequately, and assuming this gets a passing build, I'd approve it.

However, the NEWS file has a typo: it should be "a zero exit code" not "a exit code", so please make that adjustment :).

Attachments (4)

fix-twistd-exit-code.patch (4.0 KB) - added by Martin Gergov 6 years ago.
fix-twistd-exit-code.2.patch (4.3 KB) - added by Martin Gergov 6 years ago.
fix-twistd-exit-code.3.patch (9.7 KB) - added by Martin Gergov 6 years ago.
fix-twistd-exit-code.4.patch (10.4 KB) - added by Martin Gergov 6 years ago.
Support for Windows

Download all attachments as: .zip

Change History (50)

comment:1 Changed 15 years ago by jknight

After receiving a SIGINT, twisted should exit with a SIGINT result code.

It should do this by exiting with:
 signal.signal(signal.SIGINT, signal.SIG_DFL)
 os.kill(os.getpid(), SIGINT)
after shutting down the reactor. 
Either that, or re-raise a KeyboardInterrupt, which would let a future version of 
python return the right exit code to the OS. (I submitted python bug 1054041 for this too). 

http://www.cons.org/cracauer/sigint.html

comment:2 Changed 15 years ago by Jean-Paul Calderone

This is something twistd could do (I'm guessing that's what you meant).  I don't
think reactor.run() should, though.  Perhaps it could return status information,
though.

comment:3 Changed 10 years ago by miracle2k

Cc: miracle2k added

comment:4 Changed 10 years ago by Jean-Paul Calderone

See also #718.

comment:5 Changed 9 years ago by Jean-Paul Calderone

Keywords: twistd added

comment:6 Changed 9 years ago by <automation>

Owner: jknight deleted

Changed 6 years ago by Martin Gergov

Attachment: fix-twistd-exit-code.patch added

comment:7 Changed 6 years ago by Martin Gergov

Keywords: review added

Note that GNU programs flip the highest bit for errors(SIGINT | 128) and probably a lot more standards exist so a choice should be given( #718) for exit status.

comment:8 Changed 6 years ago by jknight

Calling exit with signal.SIGINT doesn't result in the correct exit code.

The correct exit code cannot be generated by the exit syscall at all, only by kill; please re-read http://www.cons.org/cracauer/sigint.html

comment:9 Changed 6 years ago by Martin Gergov

As far as I understood the procedure from How to be a proper shell is what the behaviour of twisd should be. So the problem is actually waiting for all child processes to exit before killing twistd. Does this mean support only for twisted spawned processes or maybe something more low-level ?

comment:10 Changed 6 years ago by Jean-Paul Calderone

Consider:

exarkun@top:~$ python
>>> import os, signal
>>> signal.signal(signal.SIGINT, signal.SIG_DFL)
<built-in function default_int_handler>
>>> os.kill(os.getpid(), signal.SIGINT)

exarkun@top:~$ echo $?
130
exarkun@top:~$ python
>>> import os
>>> import os, signal
>>> import sys
>>> sys.exit(signal.SIGINT)
exarkun@top:~$ echo $?
2

James' comment is just about the incorrect use of sys.exit instead of os.kill to wrap things up.

Changed 6 years ago by Martin Gergov

comment:11 Changed 6 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to Martin Gergov

Thanks for working on this one.

  1. The changes in bin/twistd are untested. Since it's hard to test code in bin/twistd, you probably want to avoid adding more code here. I suspect that importing the reactor here breaks reactor selection, anyway. Code for implementing the behavior of twistd belongs inside the run function for the twistd script anyway.
  2. The approach of adding an _exitStatus to the reactor's shipped with Twisted will get the job partly done. However, not all reactors are part of Twisted. This change will break third-party reactors. It's fine if third-party reactors need to do something to opt-in to this behavior, but if they don't then the behavior that results shouldn't be any *worse* than it is in trunk right now. This suggests a new optional reactor interface - or, perhaps better, perhaps this information can be tracked elsewhere, somewhere inside the twistd implementation perhaps, so that no support from reactors is required.
  3. The test needs to be more careful about doing cleanup. Assertions fail by raising exceptions; any cleanup after an assert won't run if the test fails.
  4. The test should also be split into several different tests. Each test should assert just one thing.
  5. If the implementation does end up requiring third-party reactors to implement this functionality themselves, then the tests should be re-usable so that authors of third-party reactors can apply them to their reactor implementation to be sure they've done things right.
  6. Will this work on Windows? Signals on Windows are weird.
  7. If you can think of a way to test this without *actually* delivering SIGINT and SIGTERM to the test process, that'd probably be better. If the functionality isn't implemented correctly, it seems likely that this test may fail by aborting the entire test suite run. That's not particularly nice.

Thanks again.

Changed 6 years ago by Martin Gergov

comment:12 Changed 6 years ago by Martin Gergov

Keywords: review added
Owner: Martin Gergov deleted
  1. Everything moved to twistd.py
  1. Made TwistdApplicationRunner, specific to twistd. For now just preserves

the exit code and signal handlers. I didn't knew there are reactors that don't inherit the base reactor in some way. If you have time could you please point me to some examples ? Also, it's quite obvious I copied the handlers from base.py, are the handlers a requerment for all reactors, because if they are the assign to _exitStatus could be done in wrappers around the handlers.

  1. Good point. Cleanup is moved before assertions.
  1. Done. Sliced everything.
  1. -
  1. I am currently moving to Denmark and I have no access to a Windows machine right now(or a stable internet connection for that matter). I have something in mind, perhaps this week I will get my hands on a virtual machine with 7. Maybe it will be a good idea to get FreeBSD up also. For the question: only suspicions, although I included SIGBREAK for which I found very little documentation unfortunately.
  1. MockOS and MockSignal from test_process.py seem to do the job.

I need to eat something now before I pass out.

Changed 6 years ago by Martin Gergov

Support for Windows

comment:13 Changed 6 years ago by Martin Gergov

Windows update. After extensive research of the way that Windows handles Ctrl-C and others I am still unsure how it works. However I have gathered some resources that might be helpful in future endeavors: http://msdn.microsoft.com/en-us/library/windows/desktop/ms682541(v=vs.85).aspx http://msdn.microsoft.com/en-us/library/ms811896.aspx#ucmgch09_topic3 http://msdn.microsoft.com/en-us/library/windows/desktop/ms681381(v=vs.85).aspx

Things like: "When a Ctrl+C interrupt occurs, Win32 operating systems generate a new thread to handle the interrupt. This can cause a single-thread application, such as one ported from UNIX, to become multithreaded, potentially resulting in unexpected behavior." prove that perfect behaviour cannot really be achieved, but as far as automatic and manual testing goes - it works.

Next is FreeBSD.

comment:14 Changed 6 years ago by Martin Gergov

Works on FreeBSD 9.1 without change. A curious thing however is the default handling of SIGINT - the exit status is 1 for some reason(and the same is true for programs like wc and cat). I am trying to investigate why this inconsistency occurs - SIGTERM ends twistd with 143 which is sane, again same goes for wc and cat. This is of course BSD related and in this case twistd does what it needs to whitout specific code.

comment:15 Changed 6 years ago by Glyph

Author: glyph
Branch: branches/fix-twistd-exit-code-761

(In [40471]) Branching to 'fix-twistd-exit-code-761'

comment:16 Changed 6 years ago by Glyph

comment:17 Changed 6 years ago by Glyph

Also forced a build, so it should be ready if anyone else is interested in doing a review.

comment:18 Changed 6 years ago by Glyph

Keywords: review removed
Owner: set to Martin Gergov

Sorry for the delay on the review, especially since it's so short.

Thanks for your contribution. It would be great to get this fixed.

Unfortunately it looks like the buildbots are mostly red for this one, and the test failures look pretty straightforward. Plus, there are new errors from pyflakes.

Finally, tests' docstrings should say what the system under test does, not what the test itself does; they should be in the declarative case, not imperative. In other words, instead of "Assert that a bazzed foo is bar." or "Bar a foo and then check that it is bazzed", instead, "Barring a foo will cause it to be bazzed" or "Calling bar on a Foo will baz it." Does that make sense?

Thanks again. Please resubmit when you can!

-glyph

comment:19 Changed 3 years ago by Adi Roiban

Branch: branches/fix-twistd-exit-code-761fix-twistd-exit-code-761

There is this duplicate #8477 which has an associated PR https://github.com/twisted/twisted/pull/190

comment:20 Changed 3 years ago by Glyph

Keywords: review added; twistd removed
Owner: Martin Gergov deleted

The duplicate was in review, so this one should be too.

comment:21 Changed 3 years ago by Adi Roiban

Keywords: review removed
Owner: set to Ross Perkins

Here is my review for that PR 190.


Changes look good.

Can you please update the documentation to include an example and describe this behaviour.

As part of the review, I can run the manual tests and check that all is OK.

I think that we need at least a few mocked unit tests to make sure we don't introduce any regression with the future changes.

A non-mocked functional test would be best. We have reactor.spawProcess and it should be possible to write an automated end-to-end tests for this change.


Here is the feedback from ross-p.

OK. I understand. In that case, I leave this to you guys to decide if you want it or not.

The code changes are simple. There is a manual test that has been confirmed to work.

Feel free to merge the working code if you want, even though there are no automated tests.

Or you can keep the buggy code which also does not have automated tests.

Thanks and have a nice day. :)


I am removing this from the review queue as the PR has no automated tests.

comment:22 Changed 3 years ago by Jason Litzinger

This ticket is fairly interesting, and, unless there are objections, I'm going to assign it to myself and try for a fix.

Before I dive in, it's worth trying to (briefly) document and coalesce all prior submissions/discussions here, and at least outline my train of thought.

Tickets that are either related to, or mention this ticket are:

#718 #2182 #8477

There have been two (more or less) approaches:

  1. Fix twistd so that it returns a proper exit code to the shell.
  2. Update the reactor so that it exits with an exit code appropriate to the signal.

In various comments concern has been raised regarding #2. Additionally, concern was raised regarding importing the reactor in twistd.

My proposal for fixing the issue is:

  1. The reactor should not call exit, that's an assumption on client application structure that isn't necessarily valid. If a client has code that currently runs after the reactor then calling exit in the reactor would introduce a nasty regression.
  1. The reactor _should_ provide an API to determine what, if any, signal caused it to exit. ReactorBase already deliberately chooses to handle signals, so it already cares about them, why not provide that information to the user?
  1. twistd, or rather the underlying application runner, should use this(new) API to properly exit with a signal if that was the reason for exit.

The obvious downside is adding a new public API to the reactor (ReactorBase at the least, but anything that inherits _SignalReactorMixin or handles signals should get updated. This isn't a trivial downside by any means, but the information already exists (it is passed to the handler), why not capture and expose it?

I don't feel it is worth fixing twistd exclusively because, as stated, the reactor has already chosen to handle signals. That means anyone using a reactor on a POSIX platform has signal handlers installed. It is reasonable to assume they might be interested in why the reactor stopped regardless of whether they are using twistd.

A trivial example (not using twistd) would be something like (simple_tw.py contents follow):

$ cat simple_tw.py
import os
import signal
import sys

from twisted.internet import reactor

reactor.callWhenRunning(os.kill, os.getpid(), signal.SIGINT)
reactor.run()
if reactor.getExitSignal():
    sys.exit(128 + reactor.getExitSignal())

$ python simple_tw.py
$ echo $?
130

comment:23 Changed 2 years ago by Jason Litzinger

Keywords: review added
Owner: Ross Perkins deleted

My first pass pull request for this ticket is here:

https://github.com/twisted/twisted/pull/826

The change modifes any reactor inheriting from ReactorBase, which currently captures signals, to save the signal number in class attribute so that clients may use this information later. With this information, twistd and twist now send the caught signal to themselves with os.kill (the former obtains the attribute from the selected runner).

In general, I avoided modifying exiting public APIs in any way. For example, runReactorWithLogging could have just returned the selected reactor, instead of forcing the ApplicationRunner to choose it, but this, while hopefully harmless, might have caused grief to some existing application.

I also avoided having the reactor call exit, similar to previous submissions due to reasons cited above.

comment:24 Changed 2 years ago by Jason Litzinger

Branch: fix-twistd-exit-code-761761-signal-exit

comment:25 Changed 2 years ago by Jason Litzinger

Btw, I changed the branch in the ticket, but now I'm not certain that was the right thing to do as that expects it to exist in the twisted repo. I guessing this may have been something more heavily used in the SVN days?

comment:26 in reply to:  25 Changed 2 years ago by Glyph

Replying to Jason Litzinger:

Btw, I changed the branch in the ticket, but now I'm not certain that was the right thing to do as that expects it to exist in the twisted repo. I guessing this may have been something more heavily used in the SVN days?

Yeah. It still does some stuff, but it's pretty much only useful if you're a project member and can push to the twisted repo itself. The important thing is just making sure you've made it clear where the change in review is, which you've done nicely, thank you :).

comment:27 Changed 2 years ago by Glyph

Keywords: review removed
Owner: set to Jason Litzinger

Thanks for your contribution!

Quick code review is here:

https://github.com/twisted/twisted/pull/826#pullrequestreview-45925076

bottom line - let's see how minimal we can make the public API of this; can it all be factored privately? If you could make these changes and resubmit, or explain where you're getting stuck with them, that would be great. Thanks again.

comment:28 Changed 2 years ago by Jason Litzinger

Thanks for the feedback!

I agree with reduciing the public API in this PR. There are a few things I'd like to discuss below, to get this where the community would like it to go, otherwise I'll fix ASAP.

The biggest issue I encountered was how to nicely get the caught signal out of the reactor. I chose to add the public 'exitSignal' class attribute because I thought this kind of information might be useful to general users of the reactor.

However, Glyph's comment "This attribute appears on some implementations of the reactor now but not others, it's never documented on an interface." is dead on. Unfortunately, this is true in general with how signals are handled -- some reactors (those that inherit the _SignalReactorMixin) catch them, some don't (1).

An alternate approach is to keep the class attribute private and only allow twistd/twist to use it to properly terminate. The risk is that this information is useful enough that users start accessing the private attribute. Thoughts?

Two additional alternatives come to mind, both add public APIs, but expose information regarding caching of signals/why the reactor exited.

  1. Make an interface that makes the handling of signals a deliberate and advertised trait of a reactor (ISignalHandlingReactor).
  1. Add an API that allows a user to fetch a reason for why the reactor exited. In the case of signals it could be "SIGINT", or something that represents that. In the case of calling crash it would be "crash". This would touch a lot of code.

On the subject of testing, a more complete alternative to what I did is one suggested above -- spawn a process and send it a signal. My concern is how well that will work on all platforms. Since signals, processes, and exit status' are fairly specific to each platform several of the tests might have to be skipped/ duplicated to make sure there is coverage on each. As an example complication:

http://bugs.python.org/issue18040 (2)

Despite that, is spawning a process preferred?

Thanks again!

Notes:

(1) This might be worth addressing because it isn't clear to users from the documentation that signal handlers are being installed. Is that a big deal? Probably not, unless there's any way for a user to install signal handlers that get overwritten. SIGINT won't, but SIGTERM doesn't look for any default.

(2) This does imply that the existing sigInt path won't do much on windows with 2.7, meaning neither will this change.

comment:29 Changed 2 years ago by Jason Litzinger

Wondering if I might get a suggestion. In response to review feedback, I'm trying to improve the tests for this bugfix by actually running twist. The problem is, independent of whichever plugin I run, it takes time for the reactor to come up and install the signal handlers. In this case, I'm launching a web server, then killing it. My options are:

  1. Launch, created a DelayedCall after the process starts and hope its enough (below).
  2. Launch, keep trying to connect until I succeed.
  3. Install a "fake" plugin that gives deterministic output on stdout when it is alive and running.

Any suggestions on how to intelligently wait until the handlers are installed?

Any non twistd examples to look at (twist, unlike twistd, doesn't allow one to specify any script to run, so the twistd examples don't help as much as I'd like)?

Is 3 above possible?

Some code (this passes, without this patch, fails because signal is 0.)

class ProcessStopNotifier(StartStopProcessProtocol):
    """Notify users of process start/stop"""

    def processEnded(self, reason):
        print(reason.getErrorMessage())
        self.stopped.errback(reason)


class SpawnTwistTests(twisted.trial.unittest.TestCase):

    def test_runTwistAndSendSIGINT(self):
        from twisted.internet import reactor
        cmd = sys.executable
        p = ProcessStopNotifier()

        reactor.spawnProcess(p, cmd, [cmd, '-m', 'twisted', 'web'], env=None)

        def onStart(_):
            reactor.callLater(5, p.transport.signalProcess, "INT")

        p.started.addCallback(onStart)

        def checkExit(reason):
            self.assertEquals(reason.value.signal, 2)

        return p.stopped.addErrback(checkExit)

I've also tried inlineCallbacks as some tests do, but again, I need to wait until the signal handlers are installed.

comment:30 Changed 2 years ago by Jean-Paul Calderone

Launching a child process in a unit test has a wide range of problems. It may be desirable to have this kind of _integration_ test somewhere but unless you avoid all possible branching in your integration testing, you'll either end up with an unusably slow test suite or a test suite that misses so many cases it's hardly worth having.

Additionally, running a child Python process that shares the Python execution environment of the parent Python process is fraught with all kinds of terrors. I'm skeptical that it's actually possible - if it is, it's certainly far more complex than the code that you've included in this comment.

What specific behavior are you attempting to cover with this new test? If the specific behavior is "that the child process exits with the correct code on receiving SIGINT" then I recommend narrowing your focus a bit. If this test is a response to glyph's comments about monkey-patching, consider _parameterizing_ things instead - rather than trying to run the whole thing in a child process.

comment:31 Changed 2 years ago by Jason Litzinger

Thanks, this is the kind of feedback I was hoping to receive. I was leaning towards following a different path, this solidifies that thought.

Launching a child process in a unit test has a wide range of problems.

Something I suspected but didn't fully appreciate until trying to do it. It's done elsewhere in the unit test suite, albeit with a process that exits, so I was hoping to follow a similar pattern.

Additionally, running a child Python process that shares the Python execution environment of the parent Python process is fraught with all kinds of terrors.

That definitely wasn't the intent, I was under the assumption (on Linux) that a fork/exec was taking place and wasn't trying to patch/modify anything in the newly spawned process' environment. The code above is deliberately terse, and has a few problems (it can leave a background process in some errors). Regardless, not the intent.

What specific behavior are you attempting to cover with this new test? If the specific behavior is "that the child process exits with the correct code on receiving SIGINT".

Yep.

If this test is a response to glyph's comments about monkey-patching, consider _parameterizing_ things instead - rather than trying to run the whole thing in a child process.

It was somewhat of a deliberate over-response. The common pattern in the existing twist tests is to patch the reactor install. In my case, I was also creating a new options class and patching that too. The new options class isn't necessary (already gone in my working tree), but I was hoping to further and run as much of the process as possible. This does seem like an integration test, but when the context is the exit of a new process, it seemed worth exploring (test_process does similar things).

As indicated above, I'm going to abandon spawning the process, I'll look into your suggestion.

comment:32 Changed 2 years ago by Jason Litzinger

Keywords: review added
Owner: Jason Litzinger deleted

Round two, here goes. Thanks to my reviewers for taking the time to comment. Any ommission not documented below is unintentional and can be fixed in a v3.

PR: https://github.com/twisted/twisted/pull/826

Changes in v2

  • Make all new APIs private.
  • Move the handling of exit in Twist into the run method for better testability.
  • The above Incorporates nearly all review feedback. The exception is I still re-deliver the signal. The PR has some discussion on this.
  • Heavily refactored the testing of Twist so that most of the odd, service level patching was not necessary. The remaining patches are for the method under test and to prevent twist from starting logging. I did try using a dummy logger, but other tests failed (only if the entire suite was run).

comment:33 Changed 2 years ago by Glyph

Keywords: review removed
Owner: set to Jason Litzinger

Thanks again for your very thorough contribution! Very nearly there - please see my GH review: https://github.com/twisted/twisted/pull/826#pullrequestreview-48748384

comment:34 Changed 2 years ago by Jason Litzinger

As always, thanks for the review!

One brief comment tonight on your review comment, and I'll comment in the PR where appropriate (probably over the weekend).

If you think it would be too much work to address those we can probably merge it as-is with some follow-up tickets.

Nah, lets not do that. I'd rather have the patches as good as possible before they are merged, so I'm more than happy to iterate even if it's a chunk of work. I'll look at your feedback and rework some patches. As usual I may post here if I get stuck somewhere. Onward to v3!

One sidenote, if I don't submit v3 next week it may be another couple weeks as I'm going out of town and I'm not sure if I'll have connectivity. I'll submit when I return though.

comment:35 Changed 2 years ago by Glyph

If you can't, please feel free to agitate on the mailing list for someone else to pick it up before you go. It is after all very close. We don't have as good visibility into the "reviewed but not merged" queue as the "review" queue, so it's easy for stuff like this to fall through the cracks.

comment:36 Changed 2 years ago by Jason Litzinger

I'll have connectivity, so submitting shouldn't be a problem.

On that note, I have most of the changes done, not pushed, but I'm a bit stuck and would welcome feedback on this review comment: https://github.com/twisted/twisted/pull/826/files/add82a87be83592b26ea21bcdae47b6b55144f91#diff-47154e5e46042922f7bf42744f80447d

If I don't hear anything in a day or so I'll probably submit v3 and, as always v4 and beyond based on review.

comment:37 Changed 2 years ago by Jason Litzinger

Keywords: review added
Owner: Jason Litzinger deleted

Apologies for the delay, and the abrupt message here, wanted to get this out for review before I head out for a few days.

Changes in v3:

  • Introduce _ISupportsExitSignalCapturing and add implement with ReactorBase. A discussion of why this was done is in the commit.
  • Refactor the various users of _exitSignal to use the new interface.
  • Various whitespace fixes.
  • Push patching back in test_twistd to the os module. Still patching, but at a better boundary.
  • I did not make any changes to test_twist, mostly because there would still be patching. If any of the options I proposed (see above) would be better to have in this round I am happy as always to do a v4.

Thanks in advance! -Jason

comment:38 Changed 2 years ago by Jason Litzinger

Keywords: review removed
Owner: set to Jason Litzinger

Oops, I missed some changes in _twist, will fix and put back into review.

comment:39 Changed 2 years ago by Jason Litzinger

Keywords: review added
Owner: Jason Litzinger deleted

Changes in v4:

  • Check whether reactor implements the _ISupportsExitSignalCapturing interface in twist and re-deliver the signal if one was caught.
  • Eliminate the catch-all try/except flagged during review.
  • Whitespace fixes.

The missing coverage is in the test, there is an exception case if importing signal throws an ImportError. I'm happy to remove this if need be, however, since it wasn't yet imported in the base tests I chose to err on the side of caution. Happy to remove if desired.

comment:40 Changed 2 years ago by Glyph

Description: modified (diff)
Keywords: review removed
Owner: set to Jason Litzinger

comment:41 Changed 2 years ago by Jean-Paul Calderone

glyph accidentally left his last comment in the ticket description.

comment:42 Changed 2 years ago by Jason Litzinger

Hah, I was very confused for a moment, thanks Jean-Paul. I'll address those changes this evening, and should I correct the ticket for historical purposes? Correct here means:

  1. Move Glyph's comment into a comment
  2. Restore the old description

?

comment:43 Changed 2 years ago by Jason Litzinger

Keywords: review added
Owner: Jason Litzinger deleted

Requested changes to the newsfragment made. Thanks for all the reviews.

comment:44 Changed 2 years ago by Glyph

Keywords: review removed
Owner: set to Glyph

At long last: L G T M. I have a merge commit locally and I'll push it as soon as I see that green check mark pop up on the PR.

Thanks ever so much for your enduring patience :).

comment:45 Changed 2 years ago by Jason Litzinger

No worries at all, everyone has a life with plenty going on! I appreciate all the time everyone took on reviews.

comment:46 Changed 2 years ago by Glyph <glyph@…>

Resolution: fixed
Status: newclosed

In 0d66019:

Merge remote-tracking branch 'origin/761-signal-exit' into trunk

Author: jlitzingerdev

Reviewer: exarkun, adiroiban, glyph

Fixes: ticket:761

If twist or twistd exit with a signal it now delivers that signal to itself
instead of exiting normally. On Unix platforms this results in a nonzero exit
code where previously a zero exit code was returned.

Note: See TracTickets for help on using tickets.