[Twisted-Python] epoll keep sharing state between process even after fork.

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Thu Oct 24 06:07:46 MDT 2013


On 11:19 am, itamar at itamarst.org wrote:
>On 10/23/2013 12:50 PM, Phil Mayers wrote:
>>
>>This is a multiprocessing bug IMHO.
>
>This issue with multiprocessing appears in other places too. E.g. if 
>you're using stdlib logging, child processes will try to rotate the 
>parent process logs.
>
>Basically multiprocessing on Unix is utterly broken and should never be 
>used (except in the fork+exec form in Python 3.4).

To expand on that just a bit, the form of sharing that you get when you 
fork() but you don't exec() is very difficult to use correctly (I think 
it's an open question whether it's *possible* to use correctly in a 
Python program).

The argument here is similar to the argument against shared-everything 
multithreading.  While memory (and some other per-process state) is no 
longer shared after fork(), *some* per-process state is still shared. 
And all of the state that isn't shared is still a potential source of 
bugs since it's almost certainly the case that none of it cooperated 
with the fork() call - a call which happened at some arbitrary time and 
captured a snapshot of all the state in memory at an arbitrary point.

Consider a simple implementation of a lock file, used to prevent 
multiple instances of a program from starting.  There are several ways 
fork() could break such code.  Perhaps it is partway through acquiring a 
lock on the lock file when the fork() occurs.  Perhaps the result is 
that the file ends up locked but no process thinks it is holding the 
lock.  Now no instances of the program are running.  Or perhaps the lock 
is held when fork() happens and the problem only surfaces at unlock 
time.  Perhaps one of the processes exits and releases the lock.  Now 
the program is still running but the lock isn't held.

And that's just one of the simplest possible examples of how things can 
go wrong.

The nearly uncountable different ways for failures to creep in and the 
resulting impracticality (if not impossibility) of being able to test 
that Twisted (or any Python library) actually works when fork() is used 
means that it's not likely Twisted will ever be declared compatible with 
any fork()-without-exec() usage.

You can find some examples of Twisted-using applications that run 
multiple processes, though.  Apple CalendarServer does it by passing 
file descriptors to worker processes and sends them the location of a 
configuration file describing how they should behave.  Divmod Mantissa 
does it by inserting self-describing work into a SQLite3 database.  When 
the worker process finds one of these, it knows what code to load and 
run by looking at the fields in the row.  These are variations on a 
theme - RPC, not shared (or duplicated) memory.

Hope this helps,
Jean-Paul



More information about the Twisted-Python mailing list