Opened 5 years ago

Closed 2 years ago

#6972 defect closed wontfix (wontfix)

Deadlock when spawning a lot of processes, spawnProcess hangs the reactor

Reported by: orestis Owned by: Glyph
Priority: normal Milestone:
Component: core Keywords:
Cc: orestis Branch:
Author:

Description

I have a long-running process (days/months) that acts like a watchdog, periodically launching other process to check on system status etc. After a Mac OS X upgrade, I see deadlocks.

I have a minimal reproduction script:

import sys
import twisted

print "running twisted", twisted.version, "on python"
print sys.version

from twisted.internet import reactor
from twisted.internet.defer import inlineCallbacks
from twisted.internet import utils
from twisted.internet.task import LoopingCall


from twisted.python import log
log.startLogging(sys.stdout)

@inlineCallbacks
def doit():
    #log.msg("List starting")
    listStatus = yield utils.getProcessValue(
        "/bin/launchctl",
        ("list", "com.apple.lalala", )
    )

@inlineCallbacks
def doit2():
    pm = yield utils.getProcessValue(
        "/usr/bin/pmset",
        ("-g", )
    )
    log.msg("pmset status %s" % pm)

C = 0
def batch():
    global C
    C +=1
    print 'Tick', C
    doit2()
    for _ in range(80):
        doit()

TICK = 1

c = LoopingCall(batch)
reactor.callWhenRunning(c.start, TICK)

reactor.run()

That outputs:

running twisted [twisted, version 13.2.0] on python
2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]
2014-02-14 14:08:42+0200 [-] Log opened.
2014-02-14 14:08:42+0200 [-] Tick 1
2014-02-14 14:08:42+0200 [-] pmset status 0
2014-02-14 14:08:43+0200 [-] Tick 2
2014-02-14 14:08:44+0200 [-] pmset status 0
2014-02-14 14:08:44+0200 [-] Tick 3
2014-02-14 14:08:45+0200 [-] pmset status 0
2014-02-14 14:08:45+0200 [-] Tick 4
2014-02-14 14:08:46+0200 [-] pmset status 0
2014-02-14 14:08:46+0200 [-] Tick 5
2014-02-14 14:08:47+0200 [-] pmset status 0
2014-02-14 14:08:47+0200 [-] Tick 6
2014-02-14 14:08:48+0200 [-] pmset status 0
2014-02-14 14:08:48+0200 [-] Tick 7
2014-02-14 14:08:49+0200 [-] pmset status 0
2014-02-14 14:08:49+0200 [-] Tick 8
2014-02-14 14:08:50+0200 [-] pmset status 0
2014-02-14 14:08:50+0200 [-] Tick 9
2014-02-14 14:08:51+0200 [-] pmset status 0
2014-02-14 14:08:51+0200 [-] Tick 10
2014-02-14 14:08:52+0200 [-] pmset status 0
2014-02-14 14:08:52+0200 [-] Tick 11
2014-02-14 14:08:53+0200 [-] pmset status 0
2014-02-14 14:08:53+0200 [-] Tick 12
2014-02-14 14:08:54+0200 [-] pmset status 0
2014-02-14 14:08:54+0200 [-] Tick 13
2014-02-14 14:08:55+0200 [-] pmset status 0
2014-02-14 14:08:55+0200 [-] Tick 14
2014-02-14 14:08:55+0200 [-] pmset status 0
2014-02-14 14:08:56+0200 [-] Tick 15
2014-02-14 14:08:57+0200 [-] pmset status 0
2014-02-14 14:08:57+0200 [-] Tick 16
2014-02-14 14:08:57+0200 [-] pmset status 0
2014-02-14 14:08:58+0200 [-] Tick 17
2014-02-14 14:08:58+0200 [-] pmset status 0
2014-02-14 14:08:59+0200 [-] Tick 18
2014-02-14 14:09:00+0200 [-] pmset status 0
2014-02-14 14:09:00+0200 [-] Tick 19
2014-02-14 14:09:01+0200 [-] pmset status 0
2014-02-14 14:09:01+0200 [-] Tick 20
2014-02-14 14:09:01+0200 [-] pmset status 0
2014-02-14 14:09:02+0200 [-] Tick 21
2014-02-14 14:09:03+0200 [-] pmset status 0
2014-02-14 14:09:03+0200 [-] Tick 22
2014-02-14 14:09:03+0200 [-] pmset status 0
2014-02-14 14:09:04+0200 [-] Tick 23
2014-02-14 14:09:05+0200 [-] pmset status 0
2014-02-14 14:09:05+0200 [-] Tick 24
2014-02-14 14:09:05+0200 [-] pmset status 0
2014-02-14 14:09:06+0200 [-] Tick 25
2014-02-14 14:09:07+0200 [-] pmset status 0
2014-02-14 14:09:07+0200 [-] Tick 26
2014-02-14 14:09:08+0200 [-] pmset status 0
2014-02-14 14:09:08+0200 [-] Tick 27

And then hangs there. Sampling the process (via Activity Monitor) yields this:

Sampling process 71737 for 3 seconds with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Analysis of sampling Python (pid 71737) every 1 millisecond
Process:         Python [71737]
Path:            /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Load Address:    0x10ca21000
Identifier:      Python
Version:         ???
Code Type:       X86-64
Parent Process:  fish [618]

Date/Time:       2014-02-14 14:09:54.127 +0200
OS Version:      Mac OS X 10.9.1 (13B42)
Report Version:  7

Call graph:
    2805 Thread_622607   DispatchQueue_1: com.apple.main-thread  (serial)
      2805 start  (in libdyld.dylib) + 1  [0x7fff86f345fd]
        2805 Py_Main  (in Python) + 2995  [0x10cadb363]
          2805 PyRun_SimpleFileExFlags  (in Python) + 718  [0x10caca821]
            2805 PyRun_FileExFlags  (in Python) + 137  [0x10cacacd3]
              2805 ???  (in Python)  load address 0x10ca2c000 + 0x9ec2c  [0x10cacac2c]
                2805 PyEval_EvalCode  (in Python) + 54  [0x10caaba24]
                  2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                    2805 PyEval_EvalFrameEx  (in Python) + 13079  [0x10caaf4d4]
                      2805 ???  (in Python)  load address 0x10ca2c000 + 0x868c8  [0x10cab28c8]
                        2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                          2805 PyEval_EvalFrameEx  (in Python) + 13079  [0x10caaf4d4]
                            2805 ???  (in Python)  load address 0x10ca2c000 + 0x868c8  [0x10cab28c8]
                              2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                2805 PyEval_EvalFrameEx  (in Python) + 13079  [0x10caaf4d4]
                                  2805 ???  (in Python)  load address 0x10ca2c000 + 0x868c8  [0x10cab28c8]
                                    2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                      2805 PyEval_EvalFrameEx  (in Python) + 16856  [0x10cab0395]
                                        2805 PyObject_Call  (in Python) + 101  [0x10ca35f72]
                                          2805 ???  (in Python)  load address 0x10ca2c000 + 0x4ecad  [0x10ca7acad]
                                            2805 PyObject_Call  (in Python) + 101  [0x10ca35f72]
                                              2805 ???  (in Python)  load address 0x10ca2c000 + 0x149a7  [0x10ca409a7]
                                                2805 PyObject_Call  (in Python) + 101  [0x10ca35f72]
                                                  2805 ???  (in Python)  load address 0x10ca2c000 + 0x27796  [0x10ca53796]
                                                    2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                                      2805 PyEval_EvalFrameEx  (in Python) + 16856  [0x10cab0395]
                                                        2805 PyObject_Call  (in Python) + 101  [0x10ca35f72]
                                                          2805 ???  (in Python)  load address 0x10ca2c000 + 0x27796  [0x10ca53796]
                                                            2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                                              2805 PyEval_EvalFrameEx  (in Python) + 16856  [0x10cab0395]
                                                                2805 PyObject_Call  (in Python) + 101  [0x10ca35f72]
                                                                  2805 ???  (in Python)  load address 0x10ca2c000 + 0x27796  [0x10ca53796]
                                                                    2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                                                      2805 PyEval_EvalFrameEx  (in Python) + 13079  [0x10caaf4d4]
                                                                        2805 ???  (in Python)  load address 0x10ca2c000 + 0x868c8  [0x10cab28c8]
                                                                          2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                                                            2805 PyEval_EvalFrameEx  (in Python) + 13079  [0x10caaf4d4]
                                                                              2805 ???  (in Python)  load address 0x10ca2c000 + 0x868c8  [0x10cab28c8]
                                                                                2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                                                                  2805 PyEval_EvalFrameEx  (in Python) + 13853  [0x10caaf7da]
                                                                                    2805 ???  (in Python)  load address 0x10ca2c000 + 0x1ef77  [0x10ca4af77]
                                                                                      2805 PyEval_EvalFrameEx  (in Python) + 13079  [0x10caaf4d4]
                                                                                        2805 ???  (in Python)  load address 0x10ca2c000 + 0x868c8  [0x10cab28c8]
                                                                                          2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                                                                            2805 PyEval_EvalFrameEx  (in Python) + 13079  [0x10caaf4d4]
                                                                                              2805 ???  (in Python)  load address 0x10ca2c000 + 0x868c8  [0x10cab28c8]
                                                                                                2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                                                                                  2805 PyEval_EvalFrameEx  (in Python) + 13079  [0x10caaf4d4]
                                                                                                    2805 ???  (in Python)  load address 0x10ca2c000 + 0x868c8  [0x10cab28c8]
                                                                                                      2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                                                                                        2805 PyEval_EvalFrameEx  (in Python) + 15416  [0x10caafdf5]
                                                                                                          2805 PyObject_Call  (in Python) + 101  [0x10ca35f72]
                                                                                                            2805 ???  (in Python)  load address 0x10ca2c000 + 0x4b184  [0x10ca77184]
                                                                                                              2805 ???  (in Python)  load address 0x10ca2c000 + 0x4f6ce  [0x10ca7b6ce]
                                                                                                                2805 PyObject_Call  (in Python) + 101  [0x10ca35f72]
                                                                                                                  2805 ???  (in Python)  load address 0x10ca2c000 + 0x149a7  [0x10ca409a7]
                                                                                                                    2805 PyObject_Call  (in Python) + 101  [0x10ca35f72]
                                                                                                                      2805 ???  (in Python)  load address 0x10ca2c000 + 0x27796  [0x10ca53796]
                                                                                                                        2805 PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
                                                                                                                          2805 PyEval_EvalFrameEx  (in Python) + 559  [0x10caac3ec]
                                                                                                                            2805 Py_MakePendingCalls  (in Python) + 149  [0x10caab8f3]
                                                                                                                              2805 PyThread_acquire_lock  (in Python) + 31  [0x10cad9583]
                                                                                                                                2805 ???  (in <unknown binary>)  [0x70000000e]
                                                                                                                                  2805 _sigtramp  (in libsystem_platform.dylib) + 26  [0x7fff8b8e45aa]
                                                                                                                                    2805 ???  (in Python)  load address 0x10ca2c000 + 0xb2068  [0x10cade068]
                                                                                                                                      2805 ???  (in Python)  load address 0x10ca2c000 + 0xb219d  [0x10cade19d]
                                                                                                                                        2805 Py_AddPendingCall  (in Python) + 51  [0x10caab7d3]
                                                                                                                                          2805 PyThread_acquire_lock  (in Python) + 31  [0x10cad9583]
                                                                                                                                            2805 _pthread_mutex_lock  (in libsystem_pthread.dylib) + 372  [0x7fff8c2ad779]
                                                                                                                                              2805 __psynch_mutexwait  (in libsystem_kernel.dylib) + 10  [0x7fff86bbe746]

Total number in stack (recursive counted multiple, when >=5):
        13       PyEval_EvalCodeEx  (in Python) + 1641  [0x10caac093]
        8       ???  (in Python)  load address 0x10ca2c000 + 0x868c8  [0x10cab28c8]
        8       PyEval_EvalFrameEx  (in Python) + 13079  [0x10caaf4d4]
        8       PyObject_Call  (in Python) + 101  [0x10ca35f72]

Sort by top of stack, same collapsed (when >= 5):
        __psynch_mutexwait  (in libsystem_kernel.dylib)        2805

Binary Images:
       0x10ca21000 -        0x10ca21fff  org.python.python (2.7.5 - 2.7.5) <2E89A7CC-BF9F-341D-951F-D4BCAEDE52E3> /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
       0x10ca2c000 -        0x10cb16fff  org.python.python (2.7.5 - 2.7.5) <0202C38E-0354-34FA-9996-8D2ADCE353D8> /System/Library/Frameworks/Python.framework/Versions/2.7/Python
       0x10cd1a000 -        0x10cd1bfff  time.so (76) <60566D0D-485D-3EE6-BD9F-3C330C973922> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/time.so
       0x10cd21000 -        0x10cd24fff  strop.so (76) <935E433D-305A-303F-A01A-80DEE79CD72C> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/strop.so
       0x10cd6a000 -        0x10cd6cfff  binascii.so (76) <B0E73D99-049D-3871-940A-64B4C87305FE> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/binascii.so
       0x10cd70000 -        0x10cd79fff  datetime.so (76) <0DDECD56-F93B-38D2-8012-2D7F021BF41D> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/datetime.so
       0x10cd82000 -        0x10cd83fff  cStringIO.so (76) <33F5D92E-34D7-3B43-8626-BDB07AB03941> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/cStringIO.so
       0x10cd88000 -        0x10cd8dfff  pyexpat.so (76) <0D2CD180-56F8-3AB7-B5B2-7BFCA54B6ED4> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/pyexpat.so
       0x10ce14000 -        0x10ce1bff7  _socket.so (76) <ABDAD166-1FB8-3B83-AFC1-7BF6A97F229B> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_socket.so
       0x10ce25000 -        0x10ce26fff  _functools.so (76) <4A72F6F8-BB2F-357F-8CF1-FAF3CD4371C2> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_functools.so
       0x10ce2a000 -        0x10ce2dfff  _ssl.so (76) <E4C0816B-0AFD-3360-89AC-6E4B5FB42976> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_ssl.so
       0x10ce33000 -        0x10ce36fff  _struct.so (76) <32BDEED3-1674-30E5-8C71-9536BFFEB0DE> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_struct.so
       0x10ce3c000 -        0x10ce4afff  _io.so (76) <C18AD571-D722-3780-AAF6-F586F97A10F3> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so
       0x10ce59000 -        0x10ce5bfff  select.so (76) <50E27D67-1E2C-3481-81A3-CCB53750968B> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/select.so
       0x10cfa1000 -        0x10cfa4fff  operator.so (76) <57797464-DDAA-32B0-A680-BC3583B4BDEF> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/operator.so
       0x10cfaa000 -        0x10cfacfff  _collections.so (76) <6E5B8C6C-361D-34FA-8883-E385DECFC519> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_collections.so
       0x10cfb2000 -        0x10cfb6fff  itertools.so (76) <0FFE8556-9608-3A7C-B0D0-995902493A91> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/itertools.so
       0x10cfbf000 -        0x10cfc0fff  _heapq.so (76) <EBFA8646-5C30-3853-8800-C4BDFC9D201E> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_heapq.so
       0x10cfc5000 -        0x10cfc6fff  fcntl.so (76) <90ABCE6A-D5CE-3870-97A6-E3C46FC7D4A2> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/fcntl.so
       0x10d04a000 -        0x10d04afff  grp.so (76) <2BA0ACBA-C009-384C-8981-E882A2DE01B1> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/grp.so
       0x10d08e000 -        0x10d09cfff  _ctypes.so (76) <421053C0-2110-3BCA-AE7C-25DF92BEA34A> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_ctypes.so
       0x10d0a7000 -        0x10d0aefff  crypto.so (8) <D9D43454-2328-3B74-B820-9C2FA233BDDE> /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/OpenSSL/crypto.so
       0x10d0bc000 -        0x10d0bcfff  rand.so (8) <650CE7AD-B461-362B-BA02-0B1A182B5D5D> /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/OpenSSL/rand.so
       0x10d0c0000 -        0x10d0c5fff  SSL.so (8) <5F4B36DC-5B10-376D-812B-32181C1C2444> /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/OpenSSL/SSL.so
       0x10d10f000 -        0x10d112ff7  math.so (76) <76BA19D5-48F5-31C5-9AF7-0C855F30890B> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/math.so
       0x10d118000 -        0x10d119fff  _hashlib.so (76) <81D1416F-11BD-33BF-BDB5-9D432BF241C4> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_hashlib.so
       0x10d11e000 -        0x10d11fff7  _random.so (76) <45324BA4-BC2A-3342-B41E-9ADDF6AD12F5> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_random.so
       0x10d163000 -        0x10d164fff  termios.so (76) <D8E846F9-BB9A-3D9C-924B-D3E8282E3042> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/termios.so
    0x7fff6c443000 -     0x7fff6c476817  dyld (239.3 - ???) <D1DFCF3F-0B0C-332A-BCC0-87A851B570FF> /usr/lib/dyld
    0x7fff867d1000 -     0x7fff867d3ff7  libquarantine.dylib (71) <7A1A2BCB-C03D-3A25-BFA4-3E569B2D2C38> /usr/lib/system/libquarantine.dylib
    0x7fff86827000 -     0x7fff8682eff7  liblaunch.dylib (842.1.4) <FCBF0A02-0B06-3F97-9248-5062A9DEB32C> /usr/lib/system/liblaunch.dylib
    0x7fff86ba9000 -     0x7fff86bc5ff7  libsystem_kernel.dylib (2422.1.72) <D14913DB-47F1-3591-8DAF-D4B4EF5F8818> /usr/lib/system/libsystem_kernel.dylib
    0x7fff86c85000 -     0x7fff86c8cfff  libcompiler_rt.dylib (35) <4CD916B2-1B17-362A-B403-EF24A1DAC141> /usr/lib/system/libcompiler_rt.dylib
    0x7fff86ce2000 -     0x7fff86ce3ff7  libDiagnosticMessagesClient.dylib (100) <4CDB0F7B-C0AF-3424-BC39-495696F0DB1E> /usr/lib/libDiagnosticMessagesClient.dylib
    0x7fff86ee2000 -     0x7fff86f11fd2  libsystem_m.dylib (3047.16) <B7F0E2E4-2777-33FC-A787-D6430B630D54> /usr/lib/system/libsystem_m.dylib
    0x7fff86f31000 -     0x7fff86f34ff7  libdyld.dylib (239.3) <62F4D752-4089-31A8-8B73-B95A68893B3C> /usr/lib/system/libdyld.dylib
    0x7fff86f51000 -     0x7fff86f6bfff  libdispatch.dylib (339.1.9) <46878A5B-4248-3057-962C-6D4A235EEF31> /usr/lib/system/libdispatch.dylib
    0x7fff87651000 -     0x7fff87652ff7  libsystem_sandbox.dylib (278.10) <A47E7E11-3C76-318E-B67D-98972B86F094> /usr/lib/system/libsystem_sandbox.dylib
    0x7fff87a17000 -     0x7fff87a17ff7  libkeymgr.dylib (28) <3AA8D85D-CF00-3BD3-A5A0-E28E1A32A6D8> /usr/lib/system/libkeymgr.dylib
    0x7fff87a25000 -     0x7fff87a36ff7  libsystem_asl.dylib (217.1.4) <655FB343-52CF-3E2F-B14D-BEBF5AAEF94D> /usr/lib/system/libsystem_asl.dylib
    0x7fff87ce4000 -     0x7fff87ce8ff7  libcache.dylib (62) <BDC1E65B-72A1-3DA3-A57C-B23159CAAD0B> /usr/lib/system/libcache.dylib
    0x7fff88204000 -     0x7fff8822bffb  libsystem_info.dylib (449.1.3) <7D41A156-D285-3849-A2C3-C04ADE797D98> /usr/lib/system/libsystem_info.dylib
    0x7fff884e8000 -     0x7fff884f0fff  libsystem_dnssd.dylib (522.1.11) <270DCF6C-502D-389A-AA9F-DE4624A36FF7> /usr/lib/system/libsystem_dnssd.dylib
    0x7fff88542000 -     0x7fff8855afff  libexpat.1.dylib (12) <97F4A9A7-CB3E-3BBF-9314-4997FC770E52> /usr/lib/libexpat.1.dylib
    0x7fff8914e000 -     0x7fff8914fffb  libremovefile.dylib (33) <3543F917-928E-3DB2-A2F4-7AB73B4970EF> /usr/lib/system/libremovefile.dylib
    0x7fff8930f000 -     0x7fff89313fff  libsystem_stats.dylib (93.1.26) <B9E26A9E-FBBC-3938-B8B7-6CF7CA8C99AD> /usr/lib/system/libsystem_stats.dylib
    0x7fff893d3000 -     0x7fff893eeff7  libsystem_malloc.dylib (23.1.10) <FFE5C472-B23A-318A-85BF-77CDE61900D1> /usr/lib/system/libsystem_malloc.dylib
    0x7fff894b8000 -     0x7fff8950afff  libc++.1.dylib (120) <4F68DFC5-2077-39A8-A449-CAC5FDEE7BDE> /usr/lib/libc++.1.dylib
    0x7fff8a071000 -     0x7fff8a07bfff  libcommonCrypto.dylib (60049) <8C4F0CA0-389C-3EDC-B155-E62DD2187E1D> /usr/lib/system/libcommonCrypto.dylib
    0x7fff8a07c000 -     0x7fff8a07eff3  libsystem_configuration.dylib (596.12) <C4F633D9-94C8-35D9-BB2D-84C5122533C7> /usr/lib/system/libsystem_configuration.dylib
    0x7fff8a278000 -     0x7fff8a301ff7  libsystem_c.dylib (997.1.1) <61833FAA-7281-3FF9-937F-686B6F20427C> /usr/lib/system/libsystem_c.dylib
    0x7fff8a369000 -     0x7fff8a36afff  libunc.dylib (28) <62682455-1862-36FE-8A04-7A6B91256438> /usr/lib/system/libunc.dylib
    0x7fff8b52e000 -     0x7fff8b552fff  libxpc.dylib (300.1.17) <4554927A-9467-365C-91F1-5A116989DD7F> /usr/lib/system/libxpc.dylib
    0x7fff8b553000 -     0x7fff8b738ff7  com.apple.CoreFoundation (6.9 - 855.11) <E22C6A1F-8996-349C-905E-96C3BBE07C2F> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
    0x7fff8b739000 -     0x7fff8b787fff  libcorecrypto.dylib (161.1) <F3973C28-14B6-3006-BB2B-00DD7F09ABC7> /usr/lib/system/libcorecrypto.dylib
    0x7fff8b8e1000 -     0x7fff8b8e7ff7  libsystem_platform.dylib (24.1.4) <331BA4A5-55CE-3B95-99EB-44E0C89D7FB8> /usr/lib/system/libsystem_platform.dylib
    0x7fff8bdb6000 -     0x7fff8be95fff  libcrypto.0.9.8.dylib (50) <B95B9DBA-39D3-3EEF-AF43-44608B28894E> /usr/lib/libcrypto.0.9.8.dylib
    0x7fff8c00c000 -     0x7fff8c04eff7  libauto.dylib (185.5) <F45C36E8-B606-3886-B5B1-B6745E757CA8> /usr/lib/libauto.dylib
    0x7fff8c2a9000 -     0x7fff8c2b0ff7  libsystem_pthread.dylib (53.1.4) <AB498556-B555-310E-9041-F67EC9E00E2C> /usr/lib/system/libsystem_pthread.dylib
    0x7fff8d830000 -     0x7fff8d841ff7  libz.1.dylib (53) <42E0C8C6-CA38-3CA4-8619-D24ED5DD492E> /usr/lib/libz.1.dylib
    0x7fff8d87d000 -     0x7fff8da35ff3  libicucore.A.dylib (511.27) <003B6C21-CBD1-3486-9A1D-030ADF5FA061> /usr/lib/libicucore.A.dylib
    0x7fff8da4e000 -     0x7fff8da75ff7  libsystem_network.dylib (241.3) <8B1E1F1D-A5CC-3BAE-8B1E-ABC84337A364> /usr/lib/system/libsystem_network.dylib
    0x7fff8e9c7000 -     0x7fff8e9ccff7  libunwind.dylib (35.3) <78DCC358-2FC1-302E-B395-0155B47CB547> /usr/lib/system/libunwind.dylib
    0x7fff8eace000 -     0x7fff8ead5ff3  libcopyfile.dylib (103) <5A881779-D0D6-3029-B371-E3021C2DDA5E> /usr/lib/system/libcopyfile.dylib
    0x7fff8ec1d000 -     0x7fff8ec51fff  libssl.0.9.8.dylib (50) <B15F967C-B002-36C2-9621-3456D8509F50> /usr/lib/libssl.0.9.8.dylib
    0x7fff8efdd000 -     0x7fff8efdefff  libffi.dylib (18.1) <FEB76C94-97BA-39BC-B713-D086B9757BA5> /usr/lib/libffi.dylib
    0x7fff903fb000 -     0x7fff90404ff3  libsystem_notify.dylib (121) <52571EC3-6894-37E4-946E-064B021ED44E> /usr/lib/system/libsystem_notify.dylib
    0x7fff91afe000 -     0x7fff91cabf27  libobjc.A.dylib (551.1) <AD7FD984-271E-30F4-A361-6B20319EC73B> /usr/lib/libobjc.A.dylib
    0x7fff924d0000 -     0x7fff924f9ff7  libc++abi.dylib (48) <8C16158F-CBF8-3BD7-BEF4-022704B2A326> /usr/lib/libc++abi.dylib
    0x7fff9305f000 -     0x7fff93060ff7  libsystem_blocks.dylib (63) <FB856CD1-2AEA-3907-8E9B-1E54B6827F82> /usr/lib/system/libsystem_blocks.dylib
    0x7fff93075000 -     0x7fff93076ff7  libSystem.B.dylib (1197.1.1) <BFC0DC97-46C6-3BE0-9983-54A98734897A> /usr/lib/libSystem.B.dylib
    0x7fff9307b000 -     0x7fff9307cfff  com.apple.TrustEvaluationAgent (2.0 - 25) <334A82F4-4AE4-3719-A511-86D0B0723E2B> /System/Library/PrivateFrameworks/TrustEvaluationAgent.framework/Versions/A/TrustEvaluationAgent
    0x7fff9307d000 -     0x7fff93082fff  libmacho.dylib (845) <1D2910DF-C036-3A82-A3FD-44FF73B5FF9B> /usr/lib/system/libmacho.dylib
Sample analysis of process 71737 written to file /dev/stdout

This is a really old issue that just cropped up again. The list time we worked around it by going back to python 2.6, but this time it's not an option. It's potentially a bug in Python, but since twisted is involved I thought I'd start here.

So far it doesn't seem to make a big difference which executables are actually run - I used launchctl and pmset since it's the ones we are actually using.

Attachments (1)

repro.py (495 bytes) - added by Glyph 5 years ago.
simpler reproduction of the issue

Download all attachments as: .zip

Change History (17)

comment:1 Changed 5 years ago by Jean-Paul Calderone

We could probably work around this CPython (OS X only?) by not using SIGCHLD for child process completion notification on OS X. I think that OS X KQueue supports child event notification.

comment:2 Changed 5 years ago by Itamar Turner-Trauring

The traceback plus some googling (http://bugs.python.org/issue11768 is what I found, presumably a different bug though) suggests the bug may be something like signal handler not being re-entrant for some reason and you're getting SIGCHLD just in the C code handling SIGCHLD. Try disabling SIGCHLD and just calling "twisted.internet.process.reapAllProcesses()" a few times a second and see if that's a good workaround - if so, add a note to the bug. If that is the case you may be able to reproduce the bug by setting a SIGCHLD handler and then sending SIGCHLD to the process a lot, no Twisted involved.

comment:3 in reply to:  2 Changed 5 years ago by Glyph

Replying to itamar:

The traceback plus some googling (http://bugs.python.org/issue11768 is what I found, presumably a different bug though) suggests the bug may be something like signal handler not being re-entrant for some reason and you're getting SIGCHLD just in the C code handling SIGCHLD. Try disabling SIGCHLD and just calling "twisted.internet.process.reapAllProcesses()" a few times a second and see if that's a good workaround - if so, add a note to the bug. If that is the case you may be able to reproduce the bug by setting a SIGCHLD handler and then sending SIGCHLD to the process a lot, no Twisted involved.

Why do you say "presumably a different bug though"? Looking at that report I couldn't see what minor version of Python that's been fixed in, or even if a fix has been in a release. OS X's Python is 2.7.5, and there's at least one new one since then…

comment:4 Changed 5 years ago by Itamar Turner-Trauring

The fix was applied to 2.7 branch in 2011, Python 2.7.5 was released in 2013.

comment:5 in reply to:  4 Changed 5 years ago by Glyph

Replying to itamar:

The fix was applied to 2.7 branch in 2011, Python 2.7.5 was released in 2013.

Interesting, thanks for correlating those.

Perhaps the bug wasn't fully fixed, then? Because the nature of the traceback looks really similar.

comment:6 in reply to:  1 Changed 5 years ago by Glyph

Replying to exarkun:

We could probably work around this CPython (OS X only?) by not using SIGCHLD for child process completion notification on OS X. I think that OS X KQueue supports child event notification.

https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/kqueue.2.html calls this EVFILT_PROC/NOTE_EXIT/NOTE_EXITSTATUS.

Note that even on other reactors (particularly CFReactor, but any GUI-enabled one) we can add the child-monitoring kqueue object's fileno() as a regular reader.

Changed 5 years ago by Glyph

Attachment: repro.py added

simpler reproduction of the issue

comment:7 Changed 5 years ago by Glyph

The original reproduction of the issue was pretty minimal, but pmset and launchctl are distractions, as is the fact that two separate executables were being launched. I managed to reproduce the hang with the external executable just being true, no inlineCallbacks, no secondary program. The hang is super non-deterministic and often takes a few minutes.

We should also see if this eventually hangs on Linux.

comment:8 Changed 5 years ago by Glyph

orestis, if I can ask, what version of OS X did you upgrade from / to where you started seeing this issue?

comment:9 Changed 5 years ago by Jean-Paul Calderone

Description:    Debian GNU/Linux 7.1 (wheezy)
Release:        7.1

running twisted [twisted, version 13.2.0+r41693] on python 2.7.3 (default, Jan  2 2013, 13:56:14)

Linux top 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2 x86_64 GNU/Linux

model name      : Intel(R) Celeron(R) CPU          900  @ 2.20GHz

I ran the minimal reproducer for 3000+ iterations. I also modified it to run 120 processes instead of 80 and ran the modified version concurrently with the original for 1700+ iterations. I didn't observe any unusual behavior.

comment:10 Changed 5 years ago by orestis

Thanks for all the effort on this.

I have seen this bug in 10.7, 10.8, 10.9. The only reliable way to work around it is using the stock OS X 2.6 Python in those distributions. I believe I saw it also on 2.5 in 10.7, but it's been a couple of years.

The only reason I filed the bug now is that I forgot about it until I bumped Python to 2.7 and this resurfaced. I searched our internal Campfire logs and came upon the repro script I posted.

When asking to "disable SIGCHLG", do you refer to somehow intercepting _handleSignals in posixbase.py and not calling _SIGCHLDWaker.install() or just using the stdlib signals or twisted _signals to go back to the default SIGCHLD handler?

comment:11 Changed 5 years ago by Jean-Paul Calderone

reactor.run(installSignalHandlers=False)

comment:12 Changed 4 years ago by Glyph

Owner: set to Glyph
Summary: Deadlock when spawning a lot of processesDeadlock when spawning a lot of processes, spawnProcess hangs the reactor

I filed #7525 when I found the reproducer lying around on my hard drive and couldn't locate the original. Somehow this whole ticket managed to go on for 11 comments and pages and pages of debug output without saying spawnProcess once ;-).

comment:13 Changed 4 years ago by Glyph

Priority: normallow

To follow up on this though, it looks like Apple updated Python to 2.7.5 on the latest Mavericks update (10.9.4), which means that this is now no longer an issue on current OS X.

I'll run the reproducer for another couple of hours, but I'm over 500 ticks of the reproducer script now where it used to hang pretty regularly at 200 or so.

(This is really a Python bug, and adding a workaround now only helps us on outdated versions which no major platforms ship, so I'm going to drop the priority a bit. Please correct this if my facts are mistaken.)

comment:14 Changed 4 years ago by Glyph

Cc: orestis added

orestis, I'm especially curious if you continue to see this in your environments after updating them, and if so, which ones?

comment:15 Changed 4 years ago by Glyph

Priority: lownormal

Nevermind on the priority reduction. I let it run for a while longer and it died at iteration 894.

comment:16 Changed 2 years ago by Glyph

Resolution: wontfix
Status: newclosed

This is no longer an issue on any current platform so I'm going to say it would be a waste of time working around it. If someone wants to submit a full-functional fix we could reopen, but closing for now.

Note: See TracTickets for help on using tickets.