[Twisted-Python] Question about processes in python
glyph at twistedmatrix.com
Mon Apr 12 14:23:31 EDT 2010
On Apr 12, 2010, at 12:06 PM, David Ripton wrote:
> On 2010.04.12 09:39:21 -0600, Jason J. W. Williams wrote:
>> Haven't had any issues yet. Twisted imports occur inside the process
>> function. The app was originally written as a purely blocking
>> multiprocessing app and rewritten to use Twisted inside the
>> sub-processes. It's passed all automated and hand tests without an
>> issue. Is there a reason importing Twisted inside sub-process should
>> not work?
> Here's JP's canonical answer:
> I've seen this problem in real code. We had a PyGTK + Twisted program
> that erroneously used subprocess in one place. 2% of the time, it
> caused an exception. 98% of the time, it worked fine. Classic race
> condition. Could be you have a similar bug but it never actually
> manifests on your combination of code, OS, and hardware. Hard to say.
I've noted this in a comment on the stackoverflow answer, but it bears repeating:
This was a long-standing bug in Twisted which has since been fixed on trunk, although it isn't present in a release: <http://twistedmatrix.com/trac/ticket/733>. Starting with the next release (Twisted 10.1, we hope), you should be able to do this without getting this particular type of error.
Still, I wouldn't recommend using the subprocess module in a Twisted application, and multiprocessing even less. 'subprocess' uses select(), which means that if you are running processes in a server handling a large number of connections with a reactor that you've selected for that job, you will occasionally notice that '.communicate()' will blow up because your file descriptors are too big to fit into a select() call. Its handling of EINTR and EAGAIN are less consistent than spawnProcess, you can't independently handle subprocess termination and subprocess-closing-its-output, so certain types of bugs are much tricker to track down... and I'm sure there are other issues, but I haven't had an opportunity to thoroughly audit it.
Multiprocessing has its own subtly *differently* wonky implementation of subprocess spawning (it uses os.fork(), not the subprocess module), it uses pickle to move objects between processes, and it seems to spawn threads internally, which means it depends on correct thread/process (and hey, maybe thread/pickle too) interaction.
Both of these will work reasonably well for small applications; multiprocessing is *great* if you have a simple, straightforward little multithreaded thing that you want to make multiprocess in order to take advantage of multiple cores. But by the time you've already taken the trouble to learn how to use Twisted, IMHO spawnProcess is a lot more powerful and a lot less trouble than either of these solutions. Even more so if you use a higher-level abstraction over it, like ampoule: <https://launchpad.net/ampoule>.
More information about the Twisted-Python