Opened 12 years ago

Closed 8 years ago

#791 defect closed invalid (invalid)

reapAllProcesses occassionally, as apparently signal delivery is unreliable

Reported by: itamarst Owned by:
Priority: high Milestone:
Component: core Keywords:
Cc: itamarst, jknight, spiv, exarkun, glyph, jml Branch: branches/sigchld-791-3
branch-diff, diff-cov, branch-cov, buildbot
Author: glyph


Change History (17)

comment:1 Changed 12 years ago by jknight

Has this actually been verified to be true?

comment:2 Changed 10 years ago by glyph

  • Component set to conch

It doesn't matter; see comments on #1997. We need to do this anyway.

comment:3 Changed 10 years ago by glyph

  • Component changed from conch to core

comment:4 Changed 10 years ago by jknight

Implemented fix in source:branches/sigchld-791/

Technique: a C module which installs a signal handler for SIGCHLD, with SA_RESTART, and a C function to write to a given fd. The reactor listens on that fd, and calls reapAllProcesses when there's activity.

However, if the C module is not present, it instead starts a LoopingCall running whenever there are processes waiting to be reaped, and doesn't install a SIGCHLD handler at all.

comment:5 Changed 10 years ago by jknight

  • Cc spiv exarkun glyph jml added

Also related to and should fix #1430 #1997

comment:6 Changed 10 years ago by jknight

And also #733

comment:7 Changed 10 years ago by therve

  • Keywords review added
  • Owner itamarst deleted

It seems it would need review ?

comment:8 Changed 10 years ago by exarkun

  • Keywords review removed

It doesn't have sufficient unit tests.

comment:9 Changed 10 years ago by mithrandi

  • Owner set to jknight

comment:10 Changed 10 years ago by itamarst

The LoopingCall should also be used if someone does, if it's not already. And probably should stop running if there's no Process instances and restart if necessary (so trial is happy), again, if it's not already got that behavior in the branch.

comment:11 Changed 9 years ago by exarkun

In addition to #733 and #1997, see #2535.

comment:12 Changed 9 years ago by therve

  • author set to therve
  • Branch set to branches/sigchld-791-2

(In [23160]) Branching to 'sigchld-791-2'

comment:13 Changed 9 years ago by therve

I had the opportunity to look at this a bit more. I think the looping call solution is wrong. Do we know exactly what platform doesn't have SA_RESTART? I think I'd prefer the current behavior with SIGCHLD than a polling system that browses every processes launched 10 times per second. This conflicts pretty badly with #2967.

comment:14 Changed 9 years ago by glyph

  • Owner changed from jknight to glyph
  • Status changed from new to assigned

comment:15 Changed 9 years ago by glyph

  • author changed from therve to glyph
  • Branch changed from branches/sigchld-791-2 to branches/sigchld-791-3

(In [23726]) Branching to 'sigchld-791-3'

comment:16 Changed 8 years ago by exarkun

  • Resolution set to invalid
  • Status changed from assigned to closed

SIGCHLD delivery is not unreliable. What is meant by "unreliable" when people talk about signal delivery is that multiple instances of a particular signal may be coalesced into one call to a signal handler in a process. So polling isn't necessary to deal with "unreliable signal delivery".

However, some of the code in this branch is still probably useful for resolving some Twisted process issues. For example, sigaction SA_RESTART prevents us from breaking os.popen. So some of this code should be used to resolve some tickets.

I'm closing this ticket because the problem it describes does not exist and the solution it suggests is therefore unnecessary. Since there is useful code in the branch, I will re-associate it with the ticket it's most directed at fixing, #733 (the branch conflicts with trunk, so it needed to be merged forward anyway).

comment:17 Changed 6 years ago by <automation>

  • Owner glyph deleted
Note: See TracTickets for help on using tickets.