Opened 10 months ago

Last modified 10 months ago

#9562 defect new

Twistd is not collecting child processes when run in Daemon mode

Reported by: saunderp Owned by:
Priority: high Milestone:
Component: core Keywords:
Cc: Branch:
Author:

Description (last modified by saunderp)

I have tested this on twistd 18.7 on python 3.6 and python 2.7.5.

When running the attached code, which simply runs a command (curl) and when it exits returns the result. When twistd is run in the foreground everything works as expected. However when twistd is run in daemon mode, the processExited never gets executed. It looks like something is reaping the child, without passing this information back.

When running strace of the process in foreground mode, once the child exits:

3591  exit_group(0)                     = ?
3591  +++ exited with 0 +++
3583  <... epoll_wait resumed> [{EPOLLERR, {u32=4, u64=10860585532543467524}}, {EPOLLHUP, {u32=8, u64=10860585532543467528}}, {EPOLLHUP, {u32=10, u64=10860585532543467530}}], 5, -1) = 3
3583  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3591, si_uid=1539607, si_status=0, si_utime=3, si_stime=4} ---
3583  write(9, "\0", 1)                 = 1
....
3583  write(1, "2018-11-12T07:10:49+0000 [stdout#info] Connection lost 2\n", 57) = 57
3583  wait4(3591, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 3591

However, when run in daemon mode:

3611  exit_group(0)                     = ?
3613  <... epoll_wait resumed> [{EPOLLERR, {u32=4, u64=9777629251346890756}}, {EPOLLHUP, {u32=8, u64=9777629251346890760}}, {EPOLLHUP, {u32=10, u64=9777629251346890762}}], 5, -1) = 3
3611  +++ exited with 0 +++
3613  epoll_ctl(5, EPOLL_CTL_DEL, 4, 0x7ffed1133ff0) = 0
3613  fcntl(4, F_GETFL)                 = 0x801 (flags O_WRONLY|O_NONBLOCK)
3613  fcntl(4, F_SETFL, O_WRONLY)       = 0
....
3613  write(3, "2018-11-12T07:11:02+0000 [stdout#info] Connection lost 2\n", 57) = 57
3613  wait4(3611, 0x7ffed1133a14, WNOHANG, NULL) = -1 ECHILD (No child processes)

This I believe is because when running in daemon mode, the child process parent process becomes 1, so init/systemd is reaping it. When running in foreground mode, the parent ID is the twistd process.

As a Daemon:

myuser     7118     1  6 08:14 pts/2    00:00:00   /usr/bin/curl -x http://proxy:8080 https://www.redhat.com/security/data/oval/com.redhat.rhsa-all.xml
myuser     7120     1  0 08:14 ?        00:00:00   /apps/myuser/py2/bin/python /apps/myuser/py2/bin/twistd -y poc.py

In the foreground:

myuser     7292 29897  3 08:16 pts/2    00:00:00             /apps/myuser/py2/bin/python /apps/myuser/py2/bin/twistd -ny poc.py
myuser     7300  7292  1 08:16 pts/2    00:00:00               /usr/bin/curl -x http://proxy:8080 https://www.redhat.com/security/data/oval/com.redhat.rhsa-all.xml

Attachments (1)

poc.py (3.1 KB) - added by saunderp 10 months ago.

Download all attachments as: .zip

Change History (7)

Changed 10 months ago by saunderp

Attachment: poc.py added

comment:1 Changed 10 months ago by saunderp

Description: modified (diff)

comment:2 Changed 10 months ago by saunderp

Just to add, I straced systemd, and it does appear that it is collecting the child process as suspected.

comment:3 Changed 10 months ago by Jean-Paul Calderone

My guess is that the behavior is caused by calling spawnProcess *before* twistd daemonizes. A better behaved tac file would delay the spawnProcess call until startService is called.

It might be nice if spawnProcess delayed actually launching the process until the "right" time. It probably could by providing an alternate IReactorProcess that knows about how twistd works (i.e., knows that spawnProcess will be broken until startService-time) and queues up the operation until it can succeed.

However, a much simpler solution to the problem is just to put the spawnProcess-calling code into an IService and respond to the startService event instead of running the call during the evaluation of the tac file itself.

comment:4 Changed 10 months ago by saunderp

Thank you very much - I can confirm that doing a callLater on the download() function does seem to work.

comment:5 in reply to:  4 ; Changed 10 months ago by saunderp

Replying to saunderp:

Thank you very much - I can confirm that doing a callLater on the download() function does seem to work.

I assume doing a reactor.callWhenRunning is also sufficient?

comment:6 in reply to:  5 Changed 10 months ago by Jean-Paul Calderone

Replying to saunderp:

Replying to saunderp:

Thank you very much - I can confirm that doing a callLater on the download() function does seem to work.

I assume doing a reactor.callWhenRunning is also sufficient?

I would think so. The reactor shouldn't be running until after twistd has finished whatever fork and exec work it needs to do (because some reactors are fragile and don't survive that process). IService.startService is really the optimal time though.

from twisted.application.service import Service
class OVALStarter(Service):
    def startService(self):
        c = CurrentOVAL(
            b'https://www.redhat.com/security/data/oval/com.redhat.rhsa-all.xml.bz2', 
            'http://proxy:8080'
        )
        c.download()

OVALStarter().setServiceParent(application)
Note: See TracTickets for help on using tickets.