Opened 5 years ago

Last modified 3 years ago

#4347 defect new

Creating threadpool and then daemonizing leads to uninformative hangs or crashes

Reported by: ivank Owned by:
Priority: high Milestone:
Component: core Keywords: threads
Cc: Branch:
Author: Launchpad Bug:

Description

At least two users created a ThreadPool for their WSGIResource, but they wrote their .tac files so that the ThreadPool was instantiated before twistd daemonized. After twistd daemonized, they experienced that their WSGIResource was unresponsive or that twistd crashed. I *think* there's an example of a incorrectly-written .tac file at http://paste.pocoo.org/show/185480/ (note the strace linked below is from different code)

<mezo> I run django on twisted 10.0.0, I observe strange behaviour. If twistd is a daemon : http request hang under linux (twistd crash under mac os X), whereas when twistd isn't daemonize everything works correctly. 
<mezo> more information here http://pastebin.com/VaRcLyeR (with strace output)

Assuming my diagnosis is correct (I made some assumptions), it would be nice if there were an informative error message printed at some point. For example, before forking to daemonize, twistd could raise an error if the process owns threads. If this is impractical, this bug might just be a documentation bug.

Change History (7)

comment:1 Changed 5 years ago by ivank

created a ThreadPool and start()ed it, I meant.

comment:2 follow-up: Changed 5 years ago by exarkun

If Python exposed an atfork hook, the ThreadPool could fix itself before and after and this wouldn't be an issue.

Or, if ThreadPool were a service and idiomatic usage did not involve starting it manually, then usage would probably be less error prone.

comment:3 in reply to: ↑ 2 Changed 5 years ago by glyph

Replying to exarkun:

If Python exposed an atfork hook, the ThreadPool could fix itself before and after and this wouldn't be an issue.

-1. Thread/process interaction is just a bad scene. We don't want to do more of it than we have to.

Or, if ThreadPool were a service and idiomatic usage did not involve starting it manually, then usage would probably be less error prone.

+1. Conceptually, a threadpool is a service-like thing anyway. And the reactor's service-management hooks were originally developed in support of threadpools; the first thing a deferred-returning system event trigger was used for was allowing worker threads to gracefully exit before shutdown so the process wouldn't hang.

And now for something completely different. This is mostly irrelevant to this ticket, but I wanted to record it somewhere because it occurred to me while commenting.

The fact that WSGIResource is pluggable and allows you to implement your own thread-queuing logic is a good thing, but the fact that you need to be aware of this ugly implementation detail when you're writing a basic "let's-use-WSGI" configuration file is a problem. We really ought to provide some convenience interface which hooks up everything for you so that you don't need to be aware of threadpool management in order to have a WSGI site. (Ideally, such a convenience interface would be trivial to switch over to multiprocessing instead of multithreading via a simple option rather than via importing a whole constellation of additional classes.)

I don't really have any concrete solutions to propose, but I've been thinking that if we had explicit testing support at multiple levels of the API (i.e. if you want to test HTTP you use HTTP-testing interfaces, not reactor-testing interfaces) it would be a lot easier to support applications doing testing without having to remember and construct a ton of different lower-level parameters for everything.

comment:4 follow-up: Changed 5 years ago by exarkun

So, the short version, make ThreadPool implement IService and somehow encourage people to start and stop it that way, rather than the current way. Perhaps by updating all the examples of using it, or by writing explicit howto docs which say this, or deprecating the old methods.

This should be a pretty straightforward refactoring + documentation task.

comment:5 in reply to: ↑ 4 Changed 4 years ago by glyph

Replying to exarkun:

So, the short version, make ThreadPool implement IService

This is a good idea, but IService lives too far away and too high up the dependency chain for ThreadPool to import it and depend on it. This is almost a semantic quibble though, as a wrapper (let's say, twisted.application.threads.ThreadPoolService) would be functionally equivalent.

and somehow encourage people to start and stop it that way, rather than the current way.

This should definitely be done, but it's only part of the problem. You might want to instantiate a WSGIResource when you no longer have access to a service hierarchy; only a reactor. For example, in a .rpy. At that point, your options (as I see them) are:

  • use the reactor threadpool, hope it's big enough, and that your WSGI application isn't badly-behaved enough to interfere with name resolution or other application tasks.
  • make your own threadpool and hook it up with addSystemEventTrigger directly.
  • do some other crazy thing like adding an atexit hook

The bottom line is that what you are likely to do is not realize that you need to do any of this, and end up with a server that hangs on exit.

On the WSGI front, it should probably be dealt with specifically, and we should deploy WSGIs using '.wsgi' files, so users just don't have to deal with threadpools directly at all. Still, that involves code running well after startup, and so in order to implement that we'll have to figure out where a decent threadpool is.

My thinking is we should have twisted.application.service.addToReactor(reactor, service) which does something like this:

def addToReactor(reactor, service):
    reactor.callWhenRunning(service.startService)
    triggerID = reactor.addSystemEventTrigger("before", "shutdown", service.stopService)
    return AllowYouToStopItEarly(reactor, service, triggerID)
class AllowYouToStopItEarly:
    implements(IService)
    def stopService(self):
        d = self.service.stopService()
        # ... add a distinct system event trigger so this Deferred will be waited upon
        # if reactor.stop() is called before it fires ...
        self.reactor.removeSystemEventTrigger(self.triggerID)
        return d

Perhaps by updating all the examples of using it, or by writing explicit howto docs which say this, or deprecating the old methods.

This should be a pretty straightforward refactoring + documentation task.

Again, I agree, and none of the not-entirely-straightforward stuff I just said should detract from this smaller simpler task getting accomplished. You still might want to instantiate a WSGIResource explicitly for various reasons even if we did have more convenient drop-stuff-in-the-filesystem deployment mechanisms lying around, and that documentation and workflow should still be as easy to get right as possible.

comment:6 Changed 4 years ago by <automation>

  • Owner glyph deleted

comment:7 Changed 3 years ago by glyph

#5298 addresses some similar problems and there is some controversy as to whether it's a duplicate of this issue.

Note: See TracTickets for help on using tickets.