Opened 7 years ago

Last modified 3 years ago

#2726 defect new

there is no way to kill a group of subprocesses in Twisted (with proposed fix for Windows using Job objects)

Reported by: deboute Owned by: bhearsum
Priority: normal Milestone:
Component: core Keywords:
Cc: exarkun, bhearsum, Trent.Nelson, rcampbell, zseil, marcusl, maruel Branch: branches/signal-process-group-2726-4
(diff, github, buildbot, log)
Author: therve Launchpad Bug:

Description

i apologize in advance if this has been previously reported, i only found a reference to it in the thread of ticket #2415

The problem i report (and give a patch to fix it) is quite straightforward.

On windows, if a process created via the Process interface spawns new childs, they won't be killed via Process.SignalProcess('KILL'), only the top process will be

In order to fix this, i've modified _dumbwin32proc.py in order to reflect the counsels of an article dating from last century (last paragraph) http://www.microsoft.com/msj/0698/win320698.aspx

Note that it does not work on windows equal or under win9x/ME

Attachments (1)

_dumbwin32proc.patch (2.5 KB) - added by deboute 7 years ago.
_dumbwin32proc.py patch that enables killing of child processed

Download all attachments as: .zip

Change History (46)

Changed 7 years ago by deboute

_dumbwin32proc.py patch that enables killing of child processed

comment:1 Changed 7 years ago by exarkun

  • Cc exarkun added

This looks like a really neat patch. Thanks for contributing it!

I'm not sure the signalProcess interface is ultimately the correct one through which to expose this functionality. Its behavior on Windows is already pretty crummy, specifically due to differences between it and POSIX behavior.

I'm not really sure what the right API should look like, but perhaps it would make sense to expand the IProcessTransport interface to include an explicit method for terminating the process. This behavior might make more sense in that method than in signalProcess.

We should definitely try to include this somewhere. It's definitely an important feature for interacting with processes on windows.

comment:2 Changed 7 years ago by glyph

We need to figure out some way to test this.

As far as the interface: signals in general are a sore spot for most operating systems, even if I weren't using the term "operating systems" lightly enough to include things like Windows. I think that signalProcess is a reasonable interface for this to go through, at least for the forseeable future.

Maybe eventually we'll have some better portable way to talk to processes, but I think that the unportability of the details of signalProcess's semantics is just the violence inherent in the system.

In any case, this makes signalProcess do what it says it does on Windows. Discussion of the API should be a separate ticket that deals with all platforms.

comment:3 Changed 7 years ago by exarkun

The crappiness of signals is exactly the reason not to put this into signalProcess. There are no signals in the attached patch.

Besides, this change is totally backwards incompatible.

comment:4 Changed 7 years ago by glyph

  • Summary changed from process.signalProcess('KILL') doesn't kill child process on windows (with proposed fix) to there is no way to kill a group of subprocesses in Twisted (with proposed fix for Windows using Job objects)

Oh, you're right. Pardon me, I completely misread the patch. There is another thing which is like this, which deals with a different set of problems related to win32 and processes and termination.

To be specific: signalProcess does not kill child processes on POSIX, and it should not kill child processes on Windows. I don't know nearly enough to comment intelligently on process groups and PGIDs, so I won't.

comment:5 Changed 7 years ago by rhelmer

This patch seems to work well, and resolves http://buildbot.net/trac/ticket/77

Is there any reason not to take this patch?

comment:6 Changed 7 years ago by rhelmer

  • author set to deboute

Sorry, didn't mean to change any fields :(

comment:7 follow-up: Changed 7 years ago by glyph

  • Owner changed from glyph to rhelmer

If you'd like to shepherd a patch to being merged, please see our page on the review process.

Reasons not to accept this patch:

  • It doesn't include any tests. This is really a deal-breaker - every change to Twisted needs tests.
  • It provides a new feature while providing no documentation.
  • It is not backwards compatible, as signalProcess will now have new semantics, providing no control to even explicitly select the old semantics.
  • It only provides this functionality on Windows. For a core reactor API like this, you really need portable behavior between multiple platforms.

Please feel free to fix these issues and submit it for review, though.

comment:8 in reply to: ↑ 7 Changed 7 years ago by rhelmer

  • Status changed from new to assigned

Replying to glyph:

If you'd like to shepherd a patch to being merged, please see our page on the review process.

Reasons not to accept this patch:

  • It doesn't include any tests. This is really a deal-breaker - every change to Twisted needs tests.
  • It provides a new feature while providing no documentation.
  • It is not backwards compatible, as signalProcess will now have new semantics, providing no control to even explicitly select the old semantics.
  • It only provides this functionality on Windows. For a core reactor API like this, you really need portable behavior between multiple platforms.

Please feel free to fix these issues and submit it for review, though.

Great, thanks for the info.

Digging into this more, Buildbot is using os.kill() on posix, which (apparently) kills subprocesses, too, if you call it with the negative of the PID.

However, os.kill isn't available on win32 (not surprising because process control is so different!). I'd be happy to help out with the above list, but I'm not really sure from the discussion here (given my limited exposure to Twisted) where the proper place for this is.

Only things I've thought of:

a) signalProcess should be able to optionally kill the process group also (cross-platform)
b) signalProcess should not be used, and Twisted users (like Buildbot) should have to do os.kill() on posix, whatever else on other platforms

(a) seems nicer to me, (b) is totally doable but pushes more work onto the Twisted user that maybe should be hidden. Anything I'm missing?

comment:9 Changed 7 years ago by glyph

From the "kill(2)" manpage:

       If  pid  is less than -1, then sig is sent to every process in the pro‐
       cess group -pid.

Process groups aren't quite just "child processes". I don't believe Twisted has any APIs for dealing with process groups currently; perhaps signalProcessGroup API would be appropriate, if job objects are closely analogous to process groups (I have no idea, I don't know what a job object is).

comment:10 Changed 6 years ago by bhearsum

  • Cc bhearsum added

comment:11 Changed 6 years ago by Trent.Nelson

  • Cc Trent.Nelson added

comment:12 Changed 6 years ago by rcampbell

  • Cc rcampbell added

comment:13 Changed 6 years ago by rcampbell

just to link back and confirm that we've tested this somewhat, we've installed this patch on a buildbot machine and verified it worked by dropping network connectivity to the box during a test run. Buildbot was able to successfully kill processes under these conditions.

https://bugzilla.mozilla.org/show_bug.cgi?id=420216

comment:14 Changed 6 years ago by therve

  • author changed from deboute to therve
  • Branch set to branches/signal-process-group-2726

(In [23406]) Branching to 'signal-process-group-2726'

comment:15 Changed 6 years ago by therve

  • Branch changed from branches/signal-process-group-2726 to branches/signal-process-group-2726-2

(In [23482]) Branching to 'signal-process-group-2726-2'

comment:16 Changed 6 years ago by therve

  • Keywords review added
  • Owner changed from rhelmer to radix
  • Status changed from assigned to new

comment:17 Changed 6 years ago by exarkun

  • Keywords dumbwin32proc process review removed
  • Owner changed from radix to therve

(with radix and itamarst)

  • Can the new tests go into twisted.internet.test.test_process and use ReactorBuilder? :)
  • There's some problems with signalProcessGroup in the case where the child is not a session leader:
    • on posix, os.kill will fail with ESRCH; on windows, I dunno wtf will happen. There should be cross-platform error reporting (and tests for this of course).
    • it would be nice if spawnProcess(..., usePTY=False) could also do os.setsid if requested by the application. (It would also be nice if usePTY=True didn't force the os.setsid call.)
  • testing is fragile - make sure the grandchild was successfully created and that it exits with the correct status (ie, received SIGINT); ideally, this would all happen in the test function in the trial process (for maximum debuggability)
  • replicating the silent failure behavior of signalProcess in signalProcessGroup for the case where an unrecognized signal is specified is a bit unfortunate.
  • self.job = win32job.CreateJobObject(None, str(time.time())) - time.time()? Is that, like, a unique identifier? Maybe we could have a counter instead? time() has really low precision on Windows, too (and it can go backwards, etc) so it might not actually be unique.
  • The differences between process_creategroup.py and process_createjob.py seem not to be ideal. I don't know if there's anything to be done about that.

comment:18 Changed 6 years ago by therve

  • Branch changed from branches/signal-process-group-2726-2 to branches/signal-process-group-2726-3

(In [23665]) Branching to 'signal-process-group-2726-3'

comment:19 Changed 6 years ago by zseil

  • Cc zseil added

Just some notes about the Windows part of this patch:

  1. The second parameter to CreateJobObject() can't be None. You have to use an empty string if you want an unnamed job object.
  2. It would be nice if _dumbwin32proc.Process.processEnded() could close the job and the process handle.
  3. If the Twisted process itself is running in a job, the AssignProcessToJobObject() call would fail. The subprocess should at least be created with CREATE_BREAKAWAY_FROM_JOB flag. Even then the assignment to the job might fail, if the job in which Twisted is running doesn't have the JOB_OBJECT_LIMIT_BREAKAWAY_OK limit set.
  4. On a similar note, the job object created by Twisted should also have that limit set, so that the subprocess can assign its children to its own job objects. Something like this should work:
    def setBreakawayOk(job):
        extendedLimits = win32job.QueryInformationJobObject(job,
                                         win32job.JobObjectExtendedLimitInformation)
        basicLimits = extendedLimits["BasicLimitInformation"]
        basicLimits["LimitFlags"] |= win32job.JOB_OBJECT_LIMIT_BREAKAWAY_OK
        win32job.SetInformationJobObject(job,
                                         win32job.JobObjectExtendedLimitInformation,
                                         extendedLimits)
    
  5. Windows supports console process groups, which are somewhat similar to POSIX process groups. You can send console control events to a process group, and the C runtime library will translate them to signals. It would be nice if this support could be added to signalProcessGroup, so that the subprocesses could terminate gracefully. See the monkeypatches in the patch for ticket:1764 for how this could be done.

comment:20 Changed 6 years ago by zseil

Oh, and another thing, the docstring for IProcessTransport.signalProcess() should be fixed to mention which signalID's are supported on all platforms. Right now it implies that "HUP" and "STOP" are portable, and forgets to mention the "TERM" signalID.

comment:21 Changed 5 years ago by bhearsum

  • Keywords review added

comment:22 Changed 5 years ago by therve

  • Keywords review removed

Is there any reason to put this into review?

comment:23 Changed 5 years ago by exarkun

  • Owner changed from therve to exarkun
  • Status changed from new to assigned

Woops. I misunderstood the last few comments and suggested that bhearsum do that. I'm going to merge this branch forward to resolve some conflicts and then summarize the current state.

comment:24 Changed 5 years ago by exarkun

  • Author changed from therve to exarkun, therve
  • Branch changed from branches/signal-process-group-2726-3 to branches/signal-process-group-2726-4

(In [27779]) Branching to 'signal-process-group-2726-4'

comment:25 Changed 5 years ago by exarkun

  • Author changed from exarkun, therve to therve
  • Owner changed from exarkun to bhearsum
  • Status changed from assigned to new

Okay, merged forward, a bunch of conflicts resolved. Here's what needs to happen for this ticket to be resolved:

  1. One test is failing on Linux, twisted.test.test_process.PosixProcessTestCasePTY.test_signalProcessGroup. It fails like this: Failure: exceptions.ValueError: SIGINT: exitCode is 1, not None. I'm not sure why. Someone needs to investigate and fix it.
  2. twisted/test/process_signalgroup.py uses popen2, deprecated in the most recent releases of Python. It should be changed to use subprocess instead (I like popen2, too, but I guess we shouldn't introduce new uses of deprecated APIs).
  3. zseil's comments above (#comment:19, #comment:20) almost all seem reasonable to me (as someone who isn't really familiar with the Win32 job APIs). I think they should all be addressed, except perhaps the "console process groups" can probably be addressed separately (so we should file a new ticket for that feature when this ticket is ready to be resolved). (On the other hand, not being familiar with these APIs, it may be that the interface added in this branch won't be compatible with console process groups; someone should figure out whether that's the case or not). The new Win32 API calls in _dumbwin32proc.py could probably use some comments too, particularly if they are expanded based on zseil's suggestions (his descriptions seem adequate to me, so including that information in the implementation will probably satisfy me).
  4. The test helpers, twisted/test/process_signalgroup.py and twisted/test/process_signaljob.py could probably use a few comments explaining what they're supposed to be doing, just to ease the job of future readers.
  5. Someone should also run the unit tests on Windows and make sure they pass there.

Thanks to everyone who's worked on this so far. :) I think we're pretty close to being able to resolve this now, we just need a little more work.

I'll optimistically assign this to bhearsum, since he expressed interest in the feature. :) I'll try to be available for another review of this in a timely fashion if he or anyone else wants to address the remaining issues.

comment:26 Changed 5 years ago by bhearsum

I can probably address most of the points you mention, exarkun, 1, 2, and 4 sound pretty easy, 5 is easy, and I'll dig into 3 later.

Thanks for helping this along.

comment:27 Changed 5 years ago by marcusl

  • Cc marcusl added

comment:28 Changed 5 years ago by maruel

Using a job object is only an approximation and it doesn't work when the child processes use a job object, like we do in chromium with the sandbox.

Another way is to use toolhelp api to scan all the running processes on the system and manually determine the child processes and do recursively for grand-child. I'm not sure this can be done from python at all.

So the closest thing is taskkill.exe /PID PID /F /T

--- internet/_dumbwin32proc.py
+++ internet/_dumbwin32proc.py
@@ -244,8 +244,7 @@
         if self.pid is None:
             raise error.ProcessExitedAlready()
         if signalID in ("INT", "TERM", "KILL"):
-            win32process.TerminateProcess(self.hProcess, 1)
-
+            os.popen('taskkill /T /F /PID %s' % self.pid)

     def processEnded(self, status):
         """

The diff is against 8.1.0. I think using subprocess.call() would be better.
I can write a unit test and get that in if you think it's a good idea. Shelling out isn't great but it's the only way to do the correct thing.

comment:29 Changed 5 years ago by khorn

I think you could probably do the toolhelp thing using ctypes, though whether it's a good idea is a whole other question.

On the other hand, shelling out isn't very nice either...taskkill.exe doesn't exist on Win2k, is that a problem? Is Win2k still considered a supported platform for Twisted? I don't see a buildbot for it, but it's similar enough to XP that I'm not sure it should be dropped (if it hasn't been already).

Agree that if we go the shelling out route, subprocess is better (in general) than os.system, but what would that mean as far as blocking? Doesn't subprocess block? How long would it take for the taskkill command to finish?

comment:30 follow-up: Changed 5 years ago by exarkun

Using a job object is only an approximation

An approximation of what? Using job objects does provide a behavior which is useful for some cases. Job objects allow processes to be placed into a group and dealt with together for certain operations.

It's appears to be true that processes can opt out of this grouping, but that doesn't mean the grouping is never useful.

I'm far from a Win32 expert, but it seems like processes can opt out of the grouping that taskkill relies on. How does taskkill associate a grandchild with an ancestor if an intermediate child no longer exists?

And even if that's not the case, perhaps opting out is a useful feature to provide. Since that's closest to what the trunk API does, that's what I'm inclined to think should be added first. This doesn't mean we can't offer both behaviors though, as long as someone can figure out how.

As far as the suggested patch goes - spawning taskkill in a child process (with popen!) is not a viable approach. pywin32 and ctypes both expose the necessary APIs, so someone who wants to implement this in a way which could actually be included in Twisted should look at doing it with one of those libraries.

comment:31 Changed 5 years ago by maruel

khorn, Windows 2000 is not supported by Microsoft anymore; in fact Chromium doesn't support it (in part) since we can't even get Windows 2000 licenses with MSDN. So this is mostly irrelevant.

exarkun, if you go the job object route, as I said earlier, we'll have to fork the source file to not use it in chromium, since a process cannot be in two job objects at the same time and chromium already uses job objects and we need to use taskkill or its equivalent.

I have no idea about grandchild with missing parents with taskkill, I'd need to test it out. That would be an issue with using toolhelp too.

subprocess.Popen() could be used to make that asynchronous, but I feel the interface of this function is synchronous, hence my recommendation for subprocess.call().

comment:32 follow-up: Changed 5 years ago by exarkun

exarkun, if you go the job object route, as I said earlier, we'll have to fork the source file to not use it in chromium

But that's what you've done already, right? So nothing new there. ;)

Perhaps the best thing to do would be to carefully describe your requirements here. With that done, we can move on to trying to figure out which implementation strategy makes the most sense. That doesn't necessarily mean we won't move forward on this ticket using job objects, but it will give us something to go on for a new API. And it's always possible that, with the requirements in hand, we will be able to find a single solution that satisfies everybody.

comment:33 in reply to: ↑ 32 Changed 5 years ago by maruel

  • Cc maruel added

Replying to exarkun:

exarkun, if you go the job object route, as I said earlier, we'll have to fork the source file to not use it in chromium

But that's what you've done already, right? So nothing new there. ;)

We'd like to unfork.

Perhaps the best thing to do would be to carefully describe your requirements here. With that done, we can move on to trying to figure out which implementation strategy makes the most sense. That doesn't necessarily mean we won't move forward on this ticket using job objects, but it will give us something to go on for a new API. And it's always possible that, with the requirements in hand, we will be able to find a single solution that satisfies everybody.

The requirements for us are:

  • Not use job objects
  • Kill child processes

The first step would be to make Process.signalProcess() return a defered but that would change the function signature and defer is not used at all across this file, I assume this is by design.

comment:34 in reply to: ↑ 30 ; follow-up: Changed 3 years ago by dabrahams

Replying to exarkun:

As far as the suggested patch goes - spawning taskkill in a child process (with popen!) is not a viable approach. pywin32 and ctypes both expose the necessary APIs, so someone who wants to implement this in a way which could actually be included in Twisted should look at doing it with one of those libraries.

Before I spend time on this, does anyone care which of these is depended on? I think depending on PyWin32 would make the job much easier, and #3707 seems to indicate that we can't do anything at all with processes unless PyWin32 is installed anyway.

comment:35 in reply to: ↑ 34 Changed 3 years ago by glyph

Replying to dabrahams:

Replying to exarkun:

As far as the suggested patch goes - spawning taskkill in a child process (with popen!) is not a viable approach. pywin32 and ctypes both expose the necessary APIs, so someone who wants to implement this in a way which could actually be included in Twisted should look at doing it with one of those libraries.

Before I spend time on this, does anyone care which of these is depended on? I think depending on PyWin32 would make the job much easier, and #3707 seems to indicate that we can't do anything at all with processes unless PyWin32 is installed anyway.

That's correct. We already depend on pywin32 for spawnProcess on Windows. See the source for twisted.internet._dumbwin32proc for details.

comment:36 Changed 3 years ago by exarkun

Before I spend time on this, does anyone care which of these is depended on?

I think we should not depend on taskkill. However, a broader point is that I think it would be very useful to summarize the discussion which has occurred on this ticket. It would be nice to have a clear outline of the goals, the proposed implementation technique (either what is in the branch, or what is going to replace that, or even better, both) and the objections that have been raised.

As it is, reading the comments, I think I can see how I would proceed on this ticket, but I would be surprised if everyone else reading the comments came to the same conclusion.

comment:37 Changed 3 years ago by dabrahams

Okay, let me be specific: I propose implementing the method described in http://stackoverflow.com/questions/604522/performing-equivalent-of-kill-process-tree-in-c-on-windows/604567#604567 (translated to Python on PyWin32 of course).

comment:38 Changed 3 years ago by exarkun

That answer talks about job objects and manually walking the process "tree". Which one, or both?

comment:39 Changed 3 years ago by dabrahams

If it's OK with twisted to alter how processes are launched, then I'll create Job objects. If not, I'll walk the process tree. Which is preferred?

comment:40 follow-up: Changed 3 years ago by exarkun

Since there have been some complaints about the use of Job objects, it might be better to avoid them if possible in IReactorProcess.spawnProcess. We might want to introduce a new API that works with Job objects (since I expect they are strictly superior to walking the process tree).

comment:41 in reply to: ↑ 40 Changed 3 years ago by dabrahams

Replying to exarkun:

Since there have been some complaints about the use of Job objects, it might be better to avoid them if possible in IReactorProcess.spawnProcess.

Good; I'm much more comfortable making a less-intrusive change that doesn't touch process launching.

[But what kind of complaints? I'm curious]

We might want to introduce a new API that works with Job objects (since I expect they are strictly superior to walking the process tree).

Really? What would you do for *nix, essentially build an equivalent abstraction to MS's Job objects?

comment:42 Changed 3 years ago by exarkun

[But what kind of complaints? I'm curious]

A process created in a Job cannot, I guess, create new Jobs. Or something along those lines. maruel said that his use of spawnProcess (by way of buildbot) was of processes that create a Job, and so Twisted changing to create Jobs would break his setup.

Really? What would you do for *nix, essentially build an equivalent abstraction to MS's Job objects?

Process groups are... sort of... similar... ish. I guess (read about passing a negative PID to kill(2)). But it may also turn out that there isn't a reasonable way to provide a kill-this-process-and-a-semi-arbitrary-set-of-related-processes API meaningfully across multiple platforms and this is a place where higher level application code will need to make an intelligent decision. I'm not sure! It will take a bit of thinking to figure out, I guess.

Twisted is happy to expose platform-specific features though (eg inotify). Cross-platform is certainly better, but something is better than nothing.

comment:43 Changed 3 years ago by maruel

I think going with Job object is fine; it's the closest thing that can be used on Windows and I think our use case is not the common one. So we'll just continue hacking the sources. Making it easy to monkey patch is a plus.

comment:44 follow-up: Changed 3 years ago by dabrahams

OK, now I'm getting mixed messages. I think I'm going to let someone else who is more familiar with the Twisted community figure this out.

comment:45 in reply to: ↑ 44 Changed 3 years ago by glyph

Replying to dabrahams:

OK, now I'm getting mixed messages. I think I'm going to let someone else who is more familiar with the Twisted community figure this out.

Trac needs better UI for identifying committers or something. Well, I guess it needs better UI for everything.

You can safely ignore everybody but exarkun. Or at least take their comments in a purely advisory capacity :). Please don't get discouraged by the discussion and leave; this is a ticket we really need help with.

Thanks for your participation so far.

Note: See TracTickets for help on using tickets.