[Twisted-Python] Using Twisted for distributed computation / experiment running?
robomancer at gmail.com
Tue Apr 3 12:13:45 EDT 2007
I'm looking to use Twisted for distributing computation over a small
number (~10) of PCs. I'm wondering if anyone else has some experience
with this -- particularly if there is already a solution out there
that I can use, so that I'm not reinventing the wheel. Here's a rough
outline of what I'd like:
Setup phase: given a config file containing a list of machines and the
# of CPUs on each machine, update the source code on each machine and
start an appropriate number of experiment runners.
Run phase: a "master" process assigns an experiment to each runner.
When we get a result back, log the result to a file and send a new
experiment to that runner. Repeat until all experiments are done.
Here are my constraints:
1) The high-level code (at least) is all in Python, so the experiment
runners can collect their results by just calling Python functions.
2) I can set up ssh keys on each machine such that logging in remotely
can happen without a password.
3) I don't really have to worry about authentication: I can assume
that all machines are either on a non-internet-connected LAN or that
firewall rules are set up so that the ports aren't accessible except
from the "master" machine.
4) I need to be able to add and remove compute nodes at runtime, so I
need some sort of admin shell. However, I can wait for
currently-processing experiments to finish, so I don't have to worry
about the complexity of restarting experiments or migrating them to
5) It'd be nice (but not required) if the experiment runners could all
log some critical messages to the master process.
This seems like it would only take a few hours to implement in Twisted
(probably with PB), but I wanted to make sure I'm not reinventing the
wheel, because it seems likely that someone has done this before.
More information about the Twisted-Python