Ticket #2726 (new defect )

Opened 2 years ago

Last modified 8 months ago

there is no way to kill a group of subprocesses in Twisted (with proposed fix for Windows using Job objects)

Reported by: deboute Assigned to: therve
Type: defect Priority: normal
Milestone: Component: core
Keywords: Cc: exarkun, bhearsum, Trent.Nelson, rcampbell, zseil
Branch: branches/signal-process-group-2726-3 Author: therve
Launchpad Bug:

Description

i apologize in advance if this has been previously reported, i only found a reference to it in the thread of ticket #2415

The problem i report (and give a patch to fix it) is quite straightforward.

On windows, if a process created via the Process interface spawns new childs, they won't be killed via Process.SignalProcess('KILL'), only the top process will be

In order to fix this, i've modified _dumbwin32proc.py in order to reflect the counsels of an article dating from last century (last paragraph) http://www.microsoft.com/msj/0698/win320698.aspx

Note that it does not work on windows equal or under win9x/ME

Attachments

_dumbwin32proc.patch (2.5 kB) - added by deboute 2 years ago.
_dumbwin32proc.py patch that enables killing of child processed

Change History

  2007-06-27 10:49:49+00:00 changed by deboute

  • attachment _dumbwin32proc.patch added

_dumbwin32proc.py patch that enables killing of child processed

  2007-06-27 13:32:09+00:00 changed by exarkun

  • cc set to exarkun

This looks like a really neat patch. Thanks for contributing it!

I'm not sure the signalProcess interface is ultimately the correct one through which to expose this functionality. Its behavior on Windows is already pretty crummy, specifically due to differences between it and POSIX behavior.

I'm not really sure what the right API should look like, but perhaps it would make sense to expand the IProcessTransport interface to include an explicit method for terminating the process. This behavior might make more sense in that method than in signalProcess.

We should definitely try to include this somewhere. It's definitely an important feature for interacting with processes on windows.

  2007-06-27 16:18:34+00:00 changed by glyph

We need to figure out some way to test this.

As far as the interface: signals in general are a sore spot for most operating systems, even if I weren't using the term "operating systems" lightly enough to include things like Windows. I think that signalProcess is a reasonable interface for this to go through, at least for the forseeable future.

Maybe eventually we'll have some better portable way to talk to processes, but I think that the unportability of the details of signalProcess's semantics is just the violence inherent in the system.

In any case, this makes signalProcess do what it says it does on Windows. Discussion of the API should be a separate ticket that deals with all platforms.

  2007-06-27 16:21:43+00:00 changed by exarkun

The crappiness of signals is exactly the reason not to put this into signalProcess. There are no signals in the attached patch.

Besides, this change is totally backwards incompatible.

  2007-06-27 16:38:37+00:00 changed by glyph

  • summary changed from process.signalProcess('KILL') doesn't kill child process on windows (with proposed fix) to there is no way to kill a group of subprocesses in Twisted (with proposed fix for Windows using Job objects)

Oh, you're right. Pardon me, I completely misread the patch. There is another thing which is like this, which deals with a different set of problems related to win32 and processes and termination.

To be specific: signalProcess does not kill child processes on POSIX, and it should not kill child processes on Windows. I don't know nearly enough to comment intelligently on process groups and PGIDs, so I won't.

  2008-02-29 01:38:46+00:00 changed by rhelmer

  • branch deleted
  • author deleted

This patch seems to work well, and resolves http://buildbot.net/trac/ticket/77

Is there any reason not to take this patch?

  2008-02-29 01:39:55+00:00 changed by rhelmer

  • author set to deboute

Sorry, didn't mean to change any fields :(

follow-up: ↓ 8   2008-02-29 04:49:51+00:00 changed by glyph

  • owner changed from glyph to rhelmer

If you'd like to shepherd a patch to being merged, please see our page on the review process.

Reasons not to accept this patch:

  • It doesn't include any tests. This is really a deal-breaker - every change to Twisted needs tests.
  • It provides a new feature while providing no documentation.
  • It is not backwards compatible, as signalProcess will now have new semantics, providing no control to even explicitly select the old semantics.
  • It only provides this functionality on Windows. For a core reactor API like this, you really need portable behavior between multiple platforms.

Please feel free to fix these issues and submit it for review, though.

in reply to: ↑ 7   2008-02-29 08:15:30+00:00 changed by rhelmer

  • status changed from new to assigned

Replying to glyph:

If you'd like to shepherd a patch to being merged, please see our page on the review process. Reasons not to accept this patch: * It doesn't include any tests. This is really a deal-breaker - every change to Twisted needs tests. * It provides a new feature while providing no documentation. * It is not backwards compatible, as signalProcess will now have new semantics, providing no control to even explicitly select the old semantics. * It only provides this functionality on Windows. For a core reactor API like this, you really need portable behavior between multiple platforms. Please feel free to fix these issues and submit it for review, though.

Great, thanks for the info.

Digging into this more, Buildbot is using os.kill() on posix, which (apparently) kills subprocesses, too, if you call it with the negative of the PID.

However, os.kill isn't available on win32 (not surprising because process control is so different!). I'd be happy to help out with the above list, but I'm not really sure from the discussion here (given my limited exposure to Twisted) where the proper place for this is.

Only things I've thought of:

a) signalProcess should be able to optionally kill the process group also (cross-platform) b) signalProcess should not be used, and Twisted users (like Buildbot) should have to do os.kill() on posix, whatever else on other platforms

(a) seems nicer to me, (b) is totally doable but pushes more work onto the Twisted user that maybe should be hidden. Anything I'm missing?

  2008-02-29 08:52:46+00:00 changed by glyph

From the "kill(2)" manpage:

       If  pid  is less than -1, then sig is sent to every process in the pro‐
       cess group -pid.

Process groups aren't quite just "child processes". I don't believe Twisted has any APIs for dealing with process groups currently; perhaps signalProcessGroup API would be appropriate, if job objects are closely analogous to process groups (I have no idea, I don't know what a job object is).

  2008-03-04 15:02:29+00:00 changed by bhearsum

  • cc changed from exarkun to exarkun, bhearsum

  2008-04-11 01:54:01+00:00 changed by Trent.Nelson

  • cc changed from exarkun, bhearsum to exarkun, bhearsum, Trent.Nelson

  2008-04-23 11:53:07+00:00 changed by rcampbell

  • cc changed from exarkun, bhearsum, Trent.Nelson to exarkun, bhearsum, Trent.Nelson, rcampbell

  2008-04-23 11:58:02+00:00 changed by rcampbell

just to link back and confirm that we've tested this somewhat, we've installed this patch on a buildbot machine and verified it worked by dropping network connectivity to the box during a test run. Buildbot was able to successfully kill processes under these conditions.

https://bugzilla.mozilla.org/show_bug.cgi?id=420216

  2008-04-23 16:21:15+00:00 changed by therve

  • branch set to branches/signal-process-group-2726
  • author changed from deboute to therve

(In [23406]) Branching to 'signal-process-group-2726'

  2008-05-02 09:07:45+00:00 changed by therve

  • branch changed from branches/signal-process-group-2726 to branches/signal-process-group-2726-2

(In [23482]) Branching to 'signal-process-group-2726-2'

  2008-05-07 14:25:01+00:00 changed by therve

  • keywords changed from dumbwin32proc process to dumbwin32proc process review
  • owner changed from rhelmer to radix
  • status changed from assigned to new

  2008-05-18 22:41:54+00:00 changed by exarkun

  • keywords deleted
  • owner changed from radix to therve

(with radix and itamarst)

  • Can the new tests go into twisted.internet.test.test_process and use ReactorBuilder? :)
  • There's some problems with signalProcessGroup in the case where the child is not a session leader:
    • on posix, os.kill will fail with ESRCH; on windows, I dunno wtf will happen. There should be cross-platform error reporting (and tests for this of course).
    • it would be nice if spawnProcess(..., usePTY=False) could also do os.setsid if requested by the application. (It would also be nice if usePTY=True didn't force the os.setsid call.)
  • testing is fragile - make sure the grandchild was successfully created and that it exits with the correct status (ie, received SIGINT); ideally, this would all happen in the test function in the trial process (for maximum debuggability)
  • replicating the silent failure behavior of signalProcess in signalProcessGroup for the case where an unrecognized signal is specified is a bit unfortunate.
  • self.job = win32job.CreateJobObject(None, str(time.time())) - time.time()? Is that, like, a unique identifier? Maybe we could have a counter instead? time() has really low precision on Windows, too (and it can go backwards, etc) so it might not actually be unique.
  • The differences between process_creategroup.py and process_createjob.py seem not to be ideal. I don't know if there's anything to be done about that.

  2008-05-19 15:34:14+00:00 changed by therve

  • branch changed from branches/signal-process-group-2726-2 to branches/signal-process-group-2726-3

(In [23665]) Branching to 'signal-process-group-2726-3'

  2008-11-05 23:10:54+00:00 changed by zseil

  • cc changed from exarkun, bhearsum, Trent.Nelson, rcampbell to exarkun, bhearsum, Trent.Nelson, rcampbell, zseil
  • launchpad_bug deleted

Just some notes about the Windows part of this patch:

  1. The second parameter to CreateJobObject() can't be None. You have to use an empty string if you want an unnamed job object.
  2. It would be nice if _dumbwin32proc.Process.processEnded() could close the job and the process handle.
  3. If the Twisted process itself is running in a job, the AssignProcessToJobObject() call would fail. The subprocess should at least be created with CREATE_BREAKAWAY_FROM_JOB flag. Even then the assignment to the job might fail, if the job in which Twisted is running doesn't have the JOB_OBJECT_LIMIT_BREAKAWAY_OK limit set.
  4. On a similar note, the job object created by Twisted should also have that limit set, so that the subprocess can assign its children to its own job objects. Something like this should work:
    def setBreakawayOk(job):
        extendedLimits = win32job.QueryInformationJobObject(job,
                                         win32job.JobObjectExtendedLimitInformation)
        basicLimits = extendedLimits["BasicLimitInformation"]
        basicLimits["LimitFlags"] |= win32job.JOB_OBJECT_LIMIT_BREAKAWAY_OK
        win32job.SetInformationJobObject(job,
                                         win32job.JobObjectExtendedLimitInformation,
                                         extendedLimits)
    
  5. Windows supports console process groups, which are somewhat similar to POSIX process groups. You can send console control events to a process group, and the C runtime library will translate them to signals. It would be nice if this support could be added to signalProcessGroup, so that the subprocesses could terminate gracefully. See the monkeypatches in the patch for ticket:1764 for how this could be done.

  2008-11-05 23:39:17+00:00 changed by zseil

Oh, and another thing, the docstring for IProcessTransport.signalProcess() should be fixed to mention which signalID's are supported on all platforms. Right now it implies that "HUP" and "STOP" are portable, and forgets to mention the "TERM" signalID.

Note: See TracTickets for help on using tickets.