[Twisted-Python] Deadlocks when launching processes - how to investigate?

Itamar Turner-Trauring itamar at itamarst.org
Sat Feb 15 07:38:53 MST 2014


On 02/14/2014 07:21 AM, Orestis Markou wrote:
> Hello,
>
> I just filed https://twistedmatrix.com/trac/ticket/6972
>
> The issue I'm facing is a deadlocked Python on OS X when a lot of 
> processes are spawned. In the repro script we do this very 
> aggressively to trigger the deadlock quickly, but the actual program 
> that does this "ticks" every minute.
>
> There is a possibility that this is either a Python bug or an OS X 
> issue as the same program used to run fine in 10.5 and after some 
> upgrades to 10.7 this issue appeared. We worked around it by using 2.6 
> but now we need 2.7.
>
> I know this is going to be difficult for people to reproduce, so I 
> wonder if someone can help me investigate the issue further. I found 
> this https://dev.launchpad.net/Debugging/GDB but it doesn't work - I 
> believe, without being able to confirm, that the issue is that GDB 
> can't really work with clang-built executables? Or perhaps I don't 
> have the debugging symbols.

When debugging Python deadlocks in general:

1. Try https://pypi.python.org/pypi/faulthandler/ - send appropriate 
signal when process deadlocks.

2. If that doesn't work, I have had good luck debugging at least one 
Python mystery freeze with GDB. In particular because it has built-in 
Python support (sometimes), you to actually get a Python traceback. This 
assumes access to debugging symbols, though. lldb may have similar 
functionality, maybe Googling can help with that and finding debug symbols.


In this particular case, the traceback plus some googling 
(http://bugs.python.org/issue11768 is what I found, presumably a 
different bug though) suggests the bug may be something like signal 
handler not being re-entrant for some reason and you're getting SIGCHLD 
just in the C code handling SIGCHLD. Try disabling SIGCHLD and just 
calling "twisted.internet.process.reapAllProcesses()" a few times a 
second and see if that's a good workaround - if so, add a note to the 
bug. If that is the case you may be able to reproduce the bug by setting 
a SIGCHLD handler and then sending SIGCHLD to the process a lot, no 
Twisted involved.



More information about the Twisted-Python mailing list